*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

JIT compiled ops requires ninja

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

ninjaninjaninjaninja   .................. .................................... ..................  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m

[92m[OKAY][0m

------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------

op name
op name op name ................op name ................  ................ installed  ................installed..installed   installed compatible..
 .. --------------------------------------------------.. 
 compatiblecompatiblecompatible


----------------------------------------------------------------------------------------------------
--------------------------------------------------

cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0mcpu_adam
cpu_adam cpu_adam ..............................   ............... [92m[YES][0m[92m[YES][0mfused_adam[92m[YES][0m    ...............................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
[92m[YES][0m

 ...... [92m[OKAY][0m
fused_lambfused_adam  fused_adamfused_adam..........................    ..........................[92m[YES][0m [92m[YES][0m   [92m[YES][0m............  [92m[YES][0m [92m[OKAY][0m 
......[92m[OKAY][0m......
 fused_lamb  [92m[OKAY][0m.............[92m[OKAY][0m
 
[92m[YES][0m ......fused_lamb  [92m[OKAY][0mfused_lambsparse_attn
.............   .........................[92m[YES][0m   [93m[NO][0m[92m[YES][0m......  .......  ......[92m[OKAY][0m[92m[OKAY][0m sparse_attn

[92m[OKAY][0m transformer
............  ............[93m[NO][0m  [92m[YES][0m.......  ......[92m[OKAY][0m 
[92m[OKAY][0m
transformersparse_attn stochastic_transformer............  sparse_attn ............[92m[YES][0m .  ............ ......[93m[NO][0m [92m[YES][0m   [92m[OKAY][0m.......
...... [93m[NO][0m [92m[OKAY][0m [92m[OKAY][0m

stochastic_transformer....... transformer . [92m[OKAY][0m ............[92m[YES][0m
  ......[92m[YES][0mtransformer   [92m[OKAY][0m..................
  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
stochastic_transformer . [92m[YES][0mstochastic_transformer  .......  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report

JIT compiled ops requires ninja--------------------------------------------------
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m

[92m[OKAY][0m
----------------------------------------------------------------------------------------------------


--------------------------------------------------op nameop name--------------------------------------------------
 
 ................op name................op name   installed installed................................   .. .. installed installedcompatible 
 compatible..--------------------------------------------------..

  --------------------------------------------------compatiblecompatible


----------------------------------------------------------------------------------------------------

cpu_adam ...............cpu_adam  [92m[YES][0m...............  cpu_adam......cpu_adam[92m[YES][0m    [92m[OKAY][0m....................................
   [92m[OKAY][0m[92m[YES][0m[92m[YES][0m
  ............  [92m[OKAY][0m[92m[OKAY][0m

fused_adam .............fused_adam  [92m[YES][0m ................... fused_adam [92m[YES][0mfused_adam  ......[92m[OKAY][0m
 ............. ............. [92m[OKAY][0m [92m[YES][0m
[92m[YES][0m fused_lamb......   ......fused_lamb .............[92m[OKAY][0m  [92m[YES][0m.............
[92m[OKAY][0m  ......
[92m[YES][0m  fused_lamb[92m[OKAY][0m......
fused_lamb  [92m[OKAY][0m .............
.............  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

sparse_attn ............sparse_attn  [93m[NO][0m............  .......[93m[NO][0m  sparse_attn[92m[OKAY][0m.......sparse_attn 
  ............[92m[OKAY][0m............
transformer   [93m[NO][0mtransformer............[93m[NO][0m    ............[92m[YES][0m..............    [92m[OKAY][0m[92m[YES][0m......[92m[OKAY][0m
  
[92m[OKAY][0m......
 transformer[92m[OKAY][0mtransformer 
stochastic_transformer ............ ............ .stochastic_transformer  [92m[YES][0m [92m[YES][0m[92m[YES][0m .  ...... ............ [92m[YES][0m   [92m[OKAY][0m[92m[OKAY][0m......[92m[OKAY][0m

 
[92m[OKAY][0m
stochastic_transformer stochastic_transformer.  .[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

ninjaninja  ....................................  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

op nameop name  ................................  installedinstalled  ....  compatiblecompatible

----------------------------------------------------------------------------------------------------

cpu_adamcpu_adam  ..............................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

fused_adam fused_adam.............  .............[92m[YES][0m  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
fused_lamb .............fused_lamb  [92m[YES][0m.............  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
sparse_attn ............sparse_attn  [93m[NO][0m ...................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
transformer ............transformer  [92m[YES][0m............  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
stochastic_transformer stochastic_transformer.  .[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report


--------------------------------------------------DeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja


JIT compiled ops requires ninja--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
ninjaninjaninja   ..................ninja....................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m..................


 --------------------------------------------------[92m[OKAY][0m--------------------------------------------------

--------------------------------------------------
op name
-------------------------------------------------- op name
op name ................op name................    ................................installedinstalled  installed ..  installed .. .. compatiblecompatible ..

 compatible--------------------------------------------------compatible--------------------------------------------------


----------------------------------------------------------------------------------------------------

cpu_adamcpu_adam  ...............cpu_adam............... cpu_adam   [92m[YES][0m[92m[YES][0m............... ............... ......   ......[92m[YES][0m[92m[YES][0m[92m[OKAY][0m  ......  
......[92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0m
fused_adam .............fused_adam fused_adam [92m[YES][0m fused_adam............. .............  ................... [92m[YES][0m   [92m[YES][0m[92m[OKAY][0m......[92m[YES][0m 
 ...... [92m[OKAY][0m...... 
 [92m[OKAY][0m[92m[OKAY][0m

fused_lambfused_lamb  fused_lamb.............fused_lamb.............    .............[92m[YES][0m.............[92m[YES][0m    ......[92m[YES][0m[92m[YES][0m  ......[92m[OKAY][0m  ............
 [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

sparse_attnsparse_attnsparse_attnsparse_attn    ................................................   [93m[NO][0m [93m[NO][0m[93m[NO][0m  [93m[NO][0m .....................    .......[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m 
[92m[OKAY][0m


transformer ............transformer [92m[YES][0m  transformertransformer............ ......   ............[92m[YES][0m............[92m[OKAY][0m   [92m[YES][0m
[92m[YES][0m......   ......[92m[OKAY][0m......stochastic_transformer
   [92m[OKAY][0m[92m[OKAY][0m

stochastic_transformer.  .[92m[YES][0m stochastic_transformer [92m[YES][0m stochastic_transformer .......  ...... .[92m[YES][0m [92m[OKAY][0m  [92m[OKAY][0m[92m[YES][0m

......  ...... [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [92m[YES][0m ...... [92m[OKAY][0m
fused_lamb ............. [92m[YES][0m ...... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [92m[YES][0m ...... [92m[OKAY][0m
stochastic_transformer . [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [92m[YES][0m ...... [92m[OKAY][0m
fused_lamb ............. [92m[YES][0m ...... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [92m[YES][0m ...... [92m[OKAY][0m
stochastic_transformer . [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------
JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
DeepSpeed C++/CUDA extension op report--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m

[92m[OKAY][0m--------------------------------------------------
--------------------------------------------------


--------------------------------------------------op nameop name--------------------------------------------------
 
 op name................................op name   installed installed................ ................  ....installed   compatible installed..
compatible  
--------------------------------------------------..compatible
 --------------------------------------------------
compatible

--------------------------------------------------
--------------------------------------------------
cpu_adam ............... cpu_adam[92m[YES][0m  .....................cpu_adam cpu_adam [92m[YES][0m   [92m[OKAY][0m..................... 
...............[92m[YES][0m  [92m[OKAY][0m 
[92m[YES][0m......  ......[92m[OKAY][0mfused_adam 
 [92m[OKAY][0m.............fused_adam 
 [92m[YES][0m............. fused_adam ......[92m[YES][0m   [92m[OKAY][0m.............
......fused_adam fused_lamb  [92m[OKAY][0m .............[92m[YES][0m
.............   [92m[YES][0m......fused_lamb[92m[YES][0m   ......  ...................[92m[OKAY][0m[92m[OKAY][0m  

[92m[YES][0m [92m[OKAY][0m......fused_lamb 
 [92m[OKAY][0m.............
 fused_lamb[92m[YES][0m  ................... sparse_attn [92m[YES][0m [92m[OKAY][0m ............
......  [93m[NO][0msparse_attn[92m[OKAY][0m  .......
............  [92m[OKAY][0m[93m[NO][0m
 ....... sparse_attntransformer[92m[OKAY][0m
  ........................  [92m[YES][0mtransformer[93m[NO][0m  sparse_attn .................. .......   [92m[OKAY][0m[92m[YES][0m............
[92m[OKAY][0m  
......[93m[NO][0mstochastic_transformer transformer  [92m[OKAY][0m........  
............ [92m[OKAY][0m[92m[YES][0m 
 [92m[YES][0mstochastic_transformer......   .......transformer[92m[OKAY][0m  [92m[YES][0m 
 [92m[OKAY][0m..................
  [92m[YES][0m[92m[OKAY][0m
 stochastic_transformer...... .  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
stochastic_transformer . [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------
JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja--------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
ninjaninjaninjaninja   .................................... ..................  .................. [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m 


[92m[OKAY][0m----------------------------------------------------------------------------------------------------
--------------------------------------------------

--------------------------------------------------

op nameop nameop nameop name    ................................................ ................  installed installedinstalled   installed......  ..  compatiblecompatible compatible

compatible
----------------------------------------------------------------------------------------------------
--------------------------------------------------


--------------------------------------------------
cpu_adam cpu_adamcpu_adam...............cpu_adam    ..............................[92m[YES][0m   ...............[92m[YES][0m......[92m[YES][0m    [92m[OKAY][0m[92m[YES][0m......
......   ......[92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0m
fused_adam fused_adam.............fused_adam   .............fused_adam[92m[YES][0m.............    ...................[92m[YES][0m[92m[YES][0m   [92m[OKAY][0m ......[92m[YES][0m
......   [92m[OKAY][0m......[92m[OKAY][0m
 
[92m[OKAY][0m
fused_lamb .............fused_lambfused_lamb fused_lamb[92m[YES][0m    ................................ ............. [92m[YES][0m  [92m[OKAY][0m [92m[YES][0m[92m[YES][0m
 ...... ...... ...... [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

sparse_attn ............sparse_attn sparse_attnsparse_attn [93m[NO][0m  ....................................    .......[93m[NO][0m[93m[NO][0m[93m[NO][0m    .......[92m[OKAY][0m ..............
[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m
transformer
transformer  transformer............transformer............    ............[92m[YES][0m[92m[YES][0m   ............[92m[YES][0m............  [92m[YES][0m [92m[OKAY][0m  ......
......[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m
stochastic_transformer
 .stochastic_transformer  stochastic_transformerstochastic_transformer[92m[YES][0m .   .......[92m[YES][0m.    ......[92m[YES][0m[92m[OKAY][0m[92m[YES][0m 
 [92m[OKAY][0m ......
......  [92m[OKAY][0m[92m[OKAY][0m

------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja
JIT compiled ops requires ninja

JIT compiled ops requires ninja
--------------------------------------------------

JIT compiled ops requires ninja
ninjaninjaninjaninja   .................. ......................................................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
 
--------------------------------------------------
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
op name
-------------------------------------------------- op nameop name................  
 installed................................ op name  installed ..installed................  ..  ..compatibleinstalled
  compatible--------------------------------------------------..

  --------------------------------------------------compatible
compatible

--------------------------------------------------
--------------------------------------------------cpu_adam
 ............... [92m[YES][0m cpu_adam...... ...............  [92m[OKAY][0mcpu_adam[92m[YES][0mcpu_adam
  .....................   [92m[OKAY][0m[92m[YES][0m...............
  ......fused_adam [92m[YES][0m[92m[OKAY][0m 
.............fused_adam  [92m[YES][0m ............. ............  fused_adam[92m[OKAY][0m[92m[YES][0m   [92m[OKAY][0m...................
  [92m[YES][0m[92m[OKAY][0mfused_lamb 
 
...................  [92m[OKAY][0m[92m[YES][0m
fused_lamb  ...................fused_lamb   [92m[YES][0m[92m[OKAY][0m .............
......  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
sparse_attnfused_adam ............ [93m[NO][0m ....... [92m[OKAY][0m sparse_attn
............. sparse_attn............  ............[93m[NO][0m   transformer[92m[YES][0m[93m[NO][0m.......  ............  [92m[OKAY][0m .............
[92m[YES][0m  [92m[OKAY][0m[92m[OKAY][0m transformer
 
.................. transformer[92m[YES][0m   [92m[OKAY][0m..................fused_lamb 
[92m[OKAY][0m  
............. [92m[YES][0mstochastic_transformerstochastic_transformer[92m[YES][0m   ........   [92m[OKAY][0m ......[92m[YES][0m[92m[YES][0m
   [92m[OKAY][0m............
 stochastic_transformer [92m[OKAY][0m [92m[OKAY][0m

. [92m[YES][0m ...... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [92m[YES][0m ...... [92m[OKAY][0m
stochastic_transformer . [92m[YES][0m ...... [92m[OKAY][0m
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja


--------------------------------------------------DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja


--------------------------------------------------DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------
JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja--------------------------------------------------

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
------------------------------------------------------------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------


JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja


JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------
op name

 op nameop name ................op name................   ................ installed................ installed  installed ..installed..    compatible....compatible
 
 --------------------------------------------------compatible--------------------------------------------------

compatible
--------------------------------------------------

--------------------------------------------------
cpu_adam ............... cpu_adamcpu_adam[92m[YES][0m cpu_adam  ............... ...... ............... ...............[92m[OKAY][0m[92m[YES][0m  
 [92m[YES][0m[92m[YES][0m......   ............[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m
fused_adam
 ............. [92m[YES][0m ...... [92m[OKAY][0m
fused_adamfused_adam fused_lamb.............  fused_adam [92m[YES][0m..........................    [92m[YES][0m[92m[YES][0m...................    ............[92m[OKAY][0m[92m[YES][0m  [92m[OKAY][0m

 [92m[OKAY][0m......
 [92m[OKAY][0mfused_lamb
 fused_lamb.............  .............[92m[YES][0mfused_lamb   ......[92m[YES][0m.............sparse_attn    [92m[OKAY][0m......[92m[YES][0m
............   ......[92m[OKAY][0m[93m[NO][0m
  [92m[OKAY][0m.......
 [92m[OKAY][0m
transformersparse_attn  ........................ sparse_attn [92m[YES][0m[93m[NO][0m  ...... sparse_attn...................   [92m[OKAY][0m[93m[NO][0m 
 [92m[OKAY][0m...................
  stochastic_transformer[93m[NO][0m[92m[OKAY][0m transformer 
 ........ ............[92m[YES][0m  transformer [92m[OKAY][0m[92m[YES][0m ......
 ............ ...... transformer[92m[OKAY][0m [92m[OKAY][0m 

[92m[YES][0m............  ......[92m[YES][0m  [92m[OKAY][0m......
 stochastic_transformer[92m[OKAY][0m stochastic_transformer
.  .[92m[YES][0m  stochastic_transformer[92m[YES][0m......   .......[92m[OKAY][0m  
[92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
ninjaninjaninjaninja   .................. .................................... .................. [92m[OKAY][0m [92m[OKAY][0m 

[92m[OKAY][0m[92m[OKAY][0m----------------------------------------------------------------------------------------------------


--------------------------------------------------op nameop name--------------------------------------------------
  
................................op name  op name installedinstalled ................  ..................  ..installed installed  compatiblecompatible ..
..
 ---------------------------------------------------------------------------------------------------- compatible
compatible


----------------------------------------------------------------------------------------------------

cpu_adamcpu_adam  ..............................  [92m[YES][0mcpu_adam[92m[YES][0m cpu_adam  ...........................    [92m[OKAY][0m[92m[OKAY][0m...............[92m[YES][0m

  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0mfused_adamfused_adam
  ..........................  [92m[YES][0m[92m[YES][0m  ............fused_adam   [92m[OKAY][0m.............[92m[OKAY][0m fused_adam
[92m[YES][0m
  ...................  fused_lamb[92m[OKAY][0m [92m[YES][0mfused_lamb
.............   ...................[92m[YES][0m   [92m[OKAY][0m[92m[YES][0m......
 fused_lamb  [92m[OKAY][0m...................
  [92m[OKAY][0mfused_lamb[92m[YES][0m
  ...................  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0msparse_attn
 ............ [93m[NO][0msparse_attn  ...................  [92m[OKAY][0m[93m[NO][0m
 .......sparse_attntransformer  ............  [92m[OKAY][0msparse_attn[93m[NO][0m............
   ...................[92m[YES][0m  transformer[93m[NO][0m[92m[OKAY][0m  
 .........................   transformer[92m[OKAY][0m[92m[YES][0m
[92m[OKAY][0m  
..................transformer   stochastic_transformer[92m[OKAY][0m[92m[YES][0m............ 
  .......[92m[YES][0m  stochastic_transformer[92m[YES][0m [92m[OKAY][0m  ......
.......   [92m[OKAY][0m[92m[YES][0m[92m[OKAY][0m
 
stochastic_transformer......  .stochastic_transformer [92m[OKAY][0m [92m[YES][0m
.  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


----------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------op name

op name op nameop name ................  ................ ................installed................    installed..installedinstalled    compatible....
..  -------------------------------------------------- compatible

compatiblecompatible--------------------------------------------------


----------------------------------------------------------------------------------------------------

cpu_adam ............... [92m[YES][0mcpu_adam  .....................cpu_adam  cpu_adam [92m[OKAY][0m[92m[YES][0m...............  
 .....................[92m[YES][0m   [92m[OKAY][0m[92m[YES][0m......
  [92m[OKAY][0mfused_adam
......  .............[92m[OKAY][0m 
[92m[YES][0m ......fused_adam  [92m[OKAY][0m.............
 fused_adam[92m[YES][0m fused_lamb.............fused_adam   [92m[YES][0m ...... ................................    [92m[YES][0m[92m[OKAY][0m[92m[YES][0m[92m[OKAY][0m 
 
............fused_lamb   [92m[OKAY][0m[92m[OKAY][0m.............
fused_lamb
  [92m[YES][0m.............fused_lamb   ......[92m[YES][0m.............   [92m[OKAY][0m......[92m[YES][0m
 sparse_attn [92m[OKAY][0m 
..................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
transformersparse_attn  ........................ sparse_attn[92m[YES][0m  [93m[NO][0m sparse_attn............  ...... [93m[NO][0m ...................[92m[OKAY][0m   
.......[92m[OKAY][0m[93m[NO][0m 
 [92m[OKAY][0mstochastic_transformer.......
transformer   [92m[OKAY][0m.transformer............
  [92m[YES][0m ............[92m[YES][0mtransformer    ......[92m[YES][0m..................   [92m[OKAY][0m ......[92m[OKAY][0m[92m[YES][0m

  [92m[OKAY][0m......
 stochastic_transformer[92m[OKAY][0m 
. stochastic_transformer[92m[YES][0m  .stochastic_transformer......   [92m[YES][0m.[92m[OKAY][0m  
......[92m[YES][0m [92m[OKAY][0m 
...... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------
DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

------------------------------------------------------------------------------------------------------------------------------------------------------


JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja--------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninja

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninja

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja--------------------------------------------------

DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------
DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
JIT compiled ops requires ninja
DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninja
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------
JIT compiled ops requires ninja

JIT compiled ops requires ninja
ninjaninjaninjaninja   .................. .................................... ..................[92m[OKAY][0m  
 [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m--------------------------------------------------


--------------------------------------------------
op name---------------------------------------------------------------------------------------------------- op name

................  op name................op nameinstalled    installed..................................   installed compatible.. installed ..
  compatible--------------------------------------------------..

compatible --------------------------------------------------

compatible--------------------------------------------------

--------------------------------------------------
cpu_adam ............... [92m[YES][0mcpu_adam  ..................... cpu_adam [92m[OKAY][0m [92m[YES][0m
cpu_adam...............   ......[92m[YES][0m...............   ......[92m[OKAY][0mfused_adam
 [92m[YES][0m .............  [92m[OKAY][0m......[92m[YES][0m
  ......[92m[OKAY][0m fused_adam
[92m[OKAY][0m 
............. [92m[YES][0m ......fused_adam fused_lamb [92m[OKAY][0m .............
.............  fused_adam[92m[YES][0m[92m[YES][0m   fused_lamb.........................   ............. [92m[OKAY][0m[92m[YES][0m[92m[OKAY][0m 
 [92m[YES][0m
......  ......[92m[OKAY][0mfused_lamb 
 [92m[OKAY][0m............. 
[92m[YES][0m fused_lamb......  [92m[OKAY][0m.............sparse_attn
  ............[92m[YES][0m  [93m[NO][0m......sparse_attn   .......[92m[OKAY][0m............ sparse_attn
  [92m[OKAY][0m............[93m[NO][0m
  [93m[NO][0m....... transformer ....... [92m[OKAY][0m ............
[92m[OKAY][0m 
[92m[YES][0mtransformer  ......transformer............   ............[92m[OKAY][0msparse_attn
[92m[YES][0m  [92m[YES][0m ...... stochastic_transformer .................. [92m[OKAY][0m .[92m[OKAY][0m 
 
[93m[NO][0m[92m[YES][0m  stochastic_transformerstochastic_transformer.............    ..[92m[OKAY][0m[92m[OKAY][0m 
 
[92m[YES][0m[92m[YES][0m  ............transformer  [92m[OKAY][0m [92m[OKAY][0m
............
 [92m[YES][0m ...... [92m[OKAY][0m
stochastic_transformer . [92m[YES][0m ...... [92m[OKAY][0m
ninjaninjaninjaninja   .................................... .................. ..................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
op name

 op name................op nameop name    installed................................................    ..installed installedinstalled compatible ..
 .. ..--------------------------------------------------compatible  

compatiblecompatible--------------------------------------------------


----------------------------------------------------------------------------------------------------

cpu_adam ............... [92m[YES][0mcpu_adam  ..................... cpu_adamcpu_adam  [92m[OKAY][0m [92m[YES][0m............... 
............... ......  [92m[YES][0m[92m[YES][0m[92m[OKAY][0m  
............  fused_adam[92m[OKAY][0m[92m[OKAY][0m 

............. [92m[YES][0m ......fused_adam  [92m[OKAY][0m.............
 [92m[YES][0m fused_adamfused_adam......   fused_lamb[92m[OKAY][0m..........................
   .............[92m[YES][0m [92m[YES][0m [92m[YES][0m fused_lamb............    ...................[92m[OKAY][0m [92m[OKAY][0m [92m[OKAY][0m

[92m[YES][0m
 ...... [92m[OKAY][0m
fused_lambfused_lamb  ..........................  [92m[YES][0m[92m[YES][0m  ............  sparse_attn[92m[OKAY][0m[92m[OKAY][0m 
............
 sparse_attn[93m[NO][0m  ...................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
transformer sparse_attn............transformer   [92m[YES][0msparse_attn............ ..................  [92m[YES][0m  [93m[NO][0m............   [92m[OKAY][0m.............
 [93m[NO][0m [92m[OKAY][0m [92m[OKAY][0m
.......
stochastic_transformer  [92m[OKAY][0mstochastic_transformer.transformer
   .[92m[YES][0mtransformer ............  ......[92m[YES][0m  ............ [92m[OKAY][0m[92m[YES][0m ......
 [92m[YES][0m ...... [92m[OKAY][0m ......
[92m[OKAY][0m 
[92m[OKAY][0m
stochastic_transformer stochastic_transformer.  .[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
ninjaninjaninjaninja   .................................... ..................  .................. [92m[OKAY][0m
[92m[OKAY][0m[92m[OKAY][0m 
--------------------------------------------------
[92m[OKAY][0m
----------------------------------------------------------------------------------------------------op name

 
................op nameop name  -------------------------------------------------- ................................ installed
  installed..op name installed  compatible ..................
..   --------------------------------------------------installed
compatiblecompatible
 
..-------------------------------------------------- --------------------------------------------------
compatible

--------------------------------------------------cpu_adam
 ............... [92m[YES][0m cpu_adamcpu_adam......  [92m[OKAY][0m ...............
cpu_adam ...............[92m[YES][0m   ...............[92m[YES][0m......  [92m[OKAY][0m [92m[YES][0m
......fused_adam   ......[92m[OKAY][0m.............  
[92m[OKAY][0m[92m[YES][0m 
......fused_adam  [92m[OKAY][0m.............
 [92m[YES][0m fused_adam...... fused_lamb [92m[OKAY][0mfused_adam 
............. ............. ............. [92m[YES][0m[92m[YES][0m fused_lamb  [92m[YES][0m......  ...................   ......[92m[OKAY][0m[92m[OKAY][0m
[92m[YES][0m  
......[92m[OKAY][0mfused_lamb 
[92m[OKAY][0m 
.............fused_lamb  [92m[YES][0m.............  ......[92m[YES][0m  sparse_attn[92m[OKAY][0m...... 
 ............[92m[OKAY][0m sparse_attn
[93m[NO][0m  ...................  [92m[OKAY][0m[93m[NO][0m
 .......transformer  [92m[OKAY][0msparse_attn............
  [92m[YES][0m............transformersparse_attn    ......[93m[NO][0m........................   [92m[YES][0m.......   [92m[OKAY][0m......[92m[OKAY][0m
 
[92m[OKAY][0m[93m[NO][0m
 transformerstochastic_transformer ....... ............ . stochastic_transformer[92m[OKAY][0m  [92m[YES][0m
.[92m[YES][0m   ......transformer[92m[YES][0m......  ............  [92m[OKAY][0m[92m[OKAY][0m ......
 
[92m[YES][0m[92m[OKAY][0m stochastic_transformer
......  .[92m[OKAY][0m 
[92m[YES][0m ...... [92m[OKAY][0mstochastic_transformer
 . [92m[YES][0m ...... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.

[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------


----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja
DeepSpeed C++/CUDA extension op report

JIT compiled ops requires ninja
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


op nameop nameop nameop name   ................ ................................ ................  installed installed installedinstalled ..   ....compatible..  
compatible compatible--------------------------------------------------
compatible

utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m

------------------------------------------------------------------------------------------------------------------------------------------------------


--------------------------------------------------
cpu_adam ............... [92m[YES][0mcpu_adam cpu_adam...... cpu_adam  ............... [92m[OKAY][0m.............................. 
  [92m[YES][0m[92m[YES][0m[92m[YES][0m   ..................   fused_adam[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m

 
............. [92m[YES][0m ...... [92m[OKAY][0m
fused_adamfused_adam fused_lambfused_adam .............  ..........................  ............. [92m[YES][0m[92m[YES][0m [92m[YES][0m  [92m[YES][0m ...... ............  ...... [92m[OKAY][0m[92m[OKAY][0m
 [92m[OKAY][0m
[92m[OKAY][0m

fused_lamb fused_lambfused_lamb.............   ..........................[92m[YES][0m   [92m[YES][0msparse_attn......  [92m[YES][0m ......[92m[OKAY][0m............  
 ......[92m[OKAY][0m[93m[NO][0m 
 [92m[OKAY][0m.......
 [92m[OKAY][0m
transformer ............ sparse_attn[92m[YES][0m sparse_attn ............ ...... ............ sparse_attn[92m[OKAY][0m[93m[NO][0m
   [93m[NO][0m...................   stochastic_transformer[92m[OKAY][0m[93m[NO][0m....... 
  .[92m[OKAY][0m....... transformer 
 [92m[YES][0m[92m[OKAY][0m............ 
transformer ......[92m[YES][0m transformer  ............ [92m[OKAY][0m...... ............ 
[92m[YES][0m [92m[OKAY][0m [92m[YES][0m
......  ...... [92m[OKAY][0mstochastic_transformer[92m[OKAY][0m

 . [92m[YES][0m ......stochastic_transformer stochastic_transformer  [92m[OKAY][0m..
  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.

----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------
--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

JIT compiled ops requires ninja
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja  ..................  ....................................  ..................[92m[OKAY][0m [92m[OKAY][0m [92m[OKAY][0m

[92m[OKAY][0m

------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------


op nameop nameop name  op name ................................ ................   ................installedinstalledinstalled    ....installed  .. compatiblecompatible.. 

 compatible--------------------------------------------------compatible--------------------------------------------------


--------------------------------------------------
--------------------------------------------------
cpu_adamcpu_adamcpu_adam  cpu_adam............... ...............   ..............................[92m[YES][0m[92m[YES][0m   ...... ......[92m[YES][0m  [92m[YES][0m  [92m[OKAY][0m[92m[OKAY][0m
............
  [92m[OKAY][0m[92m[OKAY][0m

fused_adam fused_adam............. fused_adam fused_adam.............  [92m[YES][0m .......................... [92m[YES][0m  ...... [92m[YES][0m[92m[YES][0m   ......[92m[OKAY][0m............  
 [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


fused_lamb ............. fused_lambfused_lamb[92m[YES][0mfused_lamb   ............. ...................   .............[92m[YES][0m[92m[YES][0m[92m[OKAY][0m   
[92m[YES][0m............   ......[92m[OKAY][0m[92m[OKAY][0m 
[92m[OKAY][0m

sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0msparse_attn
sparse_attn sparse_attn ............transformer ........................    ............[93m[NO][0m[93m[NO][0m [93m[NO][0m   [92m[YES][0m..............  ....... [92m[OKAY][0m ......[92m[OKAY][0m
[92m[OKAY][0m 

[92m[OKAY][0mtransformer
 transformertransformer............   [92m[YES][0m........................ stochastic_transformer  [92m[YES][0m ...... [92m[YES][0m.  ...... ...... [92m[OKAY][0m[92m[YES][0m  
[92m[OKAY][0m[92m[OKAY][0m......

 [92m[OKAY][0m
stochastic_transformer stochastic_transformer.stochastic_transformer   ..[92m[YES][0m   [92m[YES][0m[92m[YES][0m......   ............[92m[OKAY][0m 
 [92m[OKAY][0m[92m[OKAY][0m

[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.

[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.

[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


------------------------------------------------------------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------
--------------------------------------------------
--------------------------------------------------

JIT compiled ops requires ninja
JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report


JIT compiled ops requires ninja
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m[92m[OKAY][0m 
[92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------

------------------------------------------------------------------------------------------------------------------------------------------------------
op name

op name op name................ op name................    installedinstalled................................    ....installed   installedcompatible 
compatible..
 ..--------------------------------------------------compatible--------------------------------------------------
 

--------------------------------------------------compatible

--------------------------------------------------
cpu_adamcpu_adam cpu_adam  ...............cpu_adam ..............................  [92m[YES][0m [92m[YES][0m[92m[YES][0m...............    ..................[92m[YES][0m    [92m[OKAY][0m......
[92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

fused_adam ............. [92m[YES][0m fused_adamfused_adam...... fused_adam  .......................... [92m[OKAY][0m [92m[YES][0m 
............. [92m[YES][0m ...... [92m[YES][0m fused_lamb[92m[OKAY][0m...... 
 ...... [92m[OKAY][0m.............fused_lamb 
  [92m[YES][0m[92m[OKAY][0m............. fused_lamb
 ...... [92m[YES][0m ............. [92m[OKAY][0m......
fused_lamb   [92m[YES][0m[92m[OKAY][0m.............
  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0msparse_attn
 ............ [93m[NO][0m ....... [92m[OKAY][0msparse_attn
 ............ [93m[NO][0mtransformer  sparse_attn...................sparse_attn   [92m[OKAY][0m [92m[YES][0m........................
   ......[93m[NO][0m[93m[NO][0mtransformer   [92m[OKAY][0m 
..........................   stochastic_transformer[92m[OKAY][0m[92m[YES][0m[92m[OKAY][0m 

 ....... transformer transformer[92m[YES][0m [92m[OKAY][0m  ..................
............   [92m[OKAY][0m[92m[YES][0m[92m[YES][0m
stochastic_transformer   .............   [92m[OKAY][0m[92m[YES][0m[92m[OKAY][0m
 
...... [92m[OKAY][0m
stochastic_transformer stochastic_transformer.  .[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum[93m [WARNING] [0m async_io: please install the libaio-devel package with yum

[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m .......[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[92m[OKAY][0m

utils .................. [92m[YES][0m ...... [92m[OKAY][0m
async_io ...............quantizer async_io [93m[NO][0m ..............  ......................[93m[NO][0m  [93m[NO][0m .......
 [93m[NO][0m[92m[OKAY][0m 
....... [93m[NO][0m
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference utils.. ..................  [92m[YES][0m[93m[NO][0m  ...... .......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer .............. [93m[NO][0mutils  ......................... [92m[OKAY][0m 
[92m[YES][0m ...... [92m[OKAY][0m--------------------------------------------------

quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.21.8.2

torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']

deepspeed infodeepspeed info  ......................................  0.5.5+58a8e13, 58a8e13, master0.5.5+58a8e13, 58a8e13, master

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0mtransformer_inference  .........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
utils .................. utils[92m[YES][0m  ........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.

[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.async_io
 ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_iotransformer_inference  .................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[93m[NO][0m

utils .................. [92m[YES][0m ...... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
utils --------------------------------------------------..................
 [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.

async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.

[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
async_io ............... [93m[NO][0m ....... [93m[NO][0m
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
--------------------------------------------------
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.21.8.2

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']

deepspeed infodeepspeed info  ......................................  0.5.5+58a8e13, 58a8e13, master0.5.5+58a8e13, 58a8e13, master

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.21.8.2

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']

deepspeed infodeepspeed info  ......................................  0.5.5+58a8e13, 58a8e13, master0.5.5+58a8e13, 58a8e13, master

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... DeepSpeed general environment info:torch 1.8, cuda 11.1

torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.2
1.8.2
torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']deepspeed info
 deepspeed info...................  ...................0.5.5+58a8e13, 58a8e13, master 
0.5.5+58a8e13, 58a8e13, masterdeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_iotransformer_inference .. [93m[NO][0m  ......................  [92m[OKAY][0m[93m[NO][0m
 ....... [93m[NO][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizertransformer_inference ..............  ..[93m[NO][0m ....... [92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m--------------------------------------------------

utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m [93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found........ 
[92m[OKAY][0m
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io [93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found................
 [93m[NO][0m ....... [93m[NO][0m
async_io ............... transformer_inference[93m[NO][0m  .........  [93m[NO][0m[93m[NO][0m
 ....... [92m[OKAY][0m
utils .................. transformer_inference[92m[YES][0m  ........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
quantizer .............. [93m[NO][0mutils  .........................  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m[93m [WARNING] [0m async_io: please install the libaio-devel package with yum

utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0masync_io
 ............... [93m[NO][0m .......utils  ..................[93m[NO][0m 
[92m[YES][0m ...... [92m[OKAY][0m
quantizertransformer_inference  ................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io [93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found................
 [93m[NO][0m ....... [93m[NO][0m
async_io ............... transformer_inference[93m[NO][0m  .........  [93m[NO][0m[93m[NO][0m 
....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0mtransformer_inference
 .. [93m[NO][0m quantizer.......  ..............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0mutils
 .................. [92m[YES][0m ......-------------------------------------------------- 
[92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.

[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.21.8.2

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc version nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ...........['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] 
deepspeed info['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] 
...................deepspeed info  ...................0.5.5+58a8e13, 58a8e13, master 
0.5.5+58a8e13, 58a8e13, masterdeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.21.8.2

torch cuda versiontorch cuda version  ..............................  11.1
nvcc version ..................... 11.2
deepspeed install path ........... 11.1
['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']nvcc version
 .....................deepspeed info  11.2...................
 0.5.5+58a8e13, 58a8e13, masterdeepspeed install path
 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1DeepSpeed general environment info:

torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.21.8.2

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']

deepspeed infodeepspeed info  ......................................  0.5.5+58a8e13, 58a8e13, master0.5.5+58a8e13, 58a8e13, master

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version ....................torch version  1.8.2....................
 1.8.2
torch cuda version ...............torch cuda version  11.1...............
 11.1nvcc version
 .....................nvcc version  11.2.....................
 11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']deepspeed info
 ...................deepspeed info  0.5.5+58a8e13, 58a8e13, master...................
 0.5.5+58a8e13, 58a8e13, masterdeepspeed wheel compiled w.
 ......deepspeed wheel compiled w.  torch 1.8, cuda 11.1......
 torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path torch install path...............  ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version torch version....................  ....................1.8.2 
1.8.2
torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 ...........deepspeed install path  ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']deepspeed info
 deepspeed info...................  ...................0.5.5+58a8e13, 58a8e13, master 
0.5.5+58a8e13, 58a8e13, masterdeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.

async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path
 ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']....................
 1.8.2
torch version torch cuda version....................  ...............1.8.2 
11.1
torch cuda versionnvcc version  ....................................  11.111.2

nvcc versiondeepspeed install path  ................................  11.2
['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']deepspeed install path
 deepspeed info...........  ................... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']0.5.5+58a8e13, 58a8e13, master

deepspeed infodeepspeed wheel compiled w.  .........................  0.5.5+58a8e13, 58a8e13, mastertorch 1.8, cuda 11.1

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']torch version
 .................... 1.8.2torch version
 .................... torch cuda version1.8.2 
............... 11.1torch cuda version
 ...............nvcc version  11.1.....................
 11.2nvcc version
 deepspeed install path.....................  ...........11.2 
deepspeed install path['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] 
...........deepspeed info  ...................['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] 
0.5.5+58a8e13, 58a8e13, masterdeepspeed info
 ...................deepspeed wheel compiled w.  0.5.5+58a8e13, 58a8e13, master......
 torch 1.8, cuda 11.1deepspeed wheel compiled w.
 ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum[93m [WARNING] [0m async_io: please install the libaio-devel package with yum

[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. 
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.

....... [92m[OKAY][0m
async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

utils .................. [92m[YES][0m ...... [92m[OKAY][0m
async_io ...............quantizer  [93m[NO][0m.............. ....... [93m[NO][0m  [93m[NO][0m.......
 [92m[OKAY][0m
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils .................. [92m[YES][0m ...... [92m[OKAY][0m
utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
DeepSpeed general environment info:deepspeed info ................... 
0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch install pathtorch 1.8, cuda 11.1 
............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.21.8.2

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']

deepspeed infodeepspeed info  ......................................  0.5.5+58a8e13, 58a8e13, master0.5.5+58a8e13, 58a8e13, master

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
using world size: 64, data-parallel-size: 4, tensor-model-parallel size: 4, pipeline-model-parallel size: 4 
using torch.float16 for parameters ...
------------------------ arguments ------------------------
  accumulate_allreduce_grads_in_fp32 .............. False
  adam_beta1 ...................................... 0.9
  adam_beta2 ...................................... 0.999
  adam_eps ........................................ 1e-08
  adlr_autoresume ................................. False
  adlr_autoresume_interval ........................ 1000
  apply_query_key_layer_scaling ................... True
  apply_residual_connection_post_layernorm ........ False
  attention_dropout ............................... 0.1
  attention_softmax_in_fp32 ....................... False
  bert_binary_head ................................ True
  bert_load ....................................... None
  bf16 ............................................ False
  bias_dropout_fusion ............................. True
  bias_gelu_fusion ................................ True
  biencoder_projection_dim ........................ 0
  biencoder_shared_query_context_model ............ False
  block_data_path ................................. None
  checkpoint_activations .......................... True
  checkpoint_in_cpu ............................... False
  checkpoint_num_layers ........................... 1
  clip_grad ....................................... 1.0
  codecarbon_dir .................................. None
  consumed_train_samples .......................... 0
  consumed_train_tokens ........................... 0
  consumed_valid_samples .......................... 0
  contigious_checkpointing ........................ False
  cpu_optimizer ................................... False
  cpu_torch_adam .................................. False
  curriculum_learning ............................. False
  data_impl ....................................... mmap
  data_parallel_size .............................. 4
  data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document']
  dataloader_type ................................. single
  DDP_impl ........................................ local
  decoder_seq_length .............................. None
  deepscale ....................................... False
  deepscale_config ................................ None
  deepspeed ....................................... True
  deepspeed_activation_checkpointing .............. True
  deepspeed_config ................................ ./ds_config.1795509.json
  deepspeed_mpi ................................... False
  distribute_checkpointed_activations ............. False
  distributed_backend ............................. nccl
  embedding_path .................................. None
  encoder_seq_length .............................. 2048
  eod_mask_loss ................................... False
  eval_interval ................................... 1000
  eval_iters ...................................... 100
  evidence_data_path .............................. None
  exit_duration_in_mins ........................... 1190
  exit_interval ................................... None
  ffn_hidden_size ................................. 8192
  finetune ........................................ False
  fp16 ............................................ True
  fp16_lm_cross_entropy ........................... False
  fp32_residual_connection ........................ False
  gigaflos_no_embeds .............................. 0
  global_batch_size ............................... 512
  glu_activation .................................. None
  hidden_dropout .................................. 0.1
  hidden_size ..................................... 2048
  hysteresis ...................................... 2
  ict_head_size ................................... None
  ict_load ........................................ None
  img_dim ......................................... 224
  indexer_batch_size .............................. 128
  indexer_log_interval ............................ 1000
  init_method_std ................................. 0.02
  init_method_xavier_uniform ...................... False
  initial_loss_scale .............................. 4294967296
  kv_channels ..................................... 128
  layernorm_epsilon ............................... 1e-05
  lazy_mpu_init ................................... None
  load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
  local_rank ...................................... 0
  log_batch_size_to_tensorboard ................... True
  log_interval .................................... 200
  log_learning_rate_to_tensorboard ................ True
  log_level ....................................... None
  log_level_replica ............................... None
  log_loss_scale_to_tensorboard ................... True
  log_num_zeros_in_grad ........................... False
  log_params_norm ................................. False
  log_timers_to_tensorboard ....................... True
  log_validation_ppl_to_tensorboard ............... True
  loss_on_targets_only ............................ True
  loss_scale ...................................... 12.0
  loss_scale_window ............................... 1000
  lr .............................................. 0.0002
  lr_decay_iters .................................. None
  lr_decay_samples ................................ 73242187
  lr_decay_style .................................. cosine
  lr_decay_tokens ................................. None
  lr_warmup_fraction .............................. None
  lr_warmup_iters ................................. 0
  lr_warmup_samples ............................... 183105
  make_vocab_size_divisible_by .................... 128
  mask_prob ....................................... 0.15
  masked_softmax_fusion ........................... True
  max_position_embeddings ......................... 2048
  memory_centric_tiled_linear ..................... False
  merge_file ...................................... /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/data/gpt2-merges.txt
  micro_batch_size ................................ 8
  min_loss_scale .................................. 1.0
  min_lr .......................................... 1e-05
  mmap_warmup ..................................... False
  no_load_optim ................................... None
  no_load_rng ..................................... None
  no_save_optim ................................... None
  no_save_rng ..................................... None
  num_attention_heads ............................. 16
  num_channels .................................... 3
  num_classes ..................................... 1000
  num_layers ...................................... 24
  num_layers_per_virtual_pipeline_stage ........... None
  num_workers ..................................... 2
  onnx_safe ....................................... None
  openai_gelu ..................................... False
  optimizer ....................................... adam
  override_lr_scheduler ........................... False
  params_dtype .................................... torch.float16
  partition_activations ........................... False
  patch_dim ....................................... 16
  pipeline_model_parallel_size .................... 4
  position_embedding_type ......................... PositionEmbeddingType.absolute
  profile_backward ................................ False
  query_in_block_prob ............................. 0.1
  rampup_batch_size ............................... ['32', '32', '2_000_000']
  rank ............................................ 0
  remote_device ................................... none
  reset_attention_mask ............................ False
  reset_position_ids .............................. False
  retriever_report_topk_accuracies ................ []
  retriever_score_scaling ......................... False
  retriever_seq_length ............................ 256
  sample_rate ..................................... 1.0
  save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
  save_interval ................................... 1500
  scatter_gather_tensors_in_pipeline .............. True
  scattered_embeddings ............................ False
  seed ............................................ 1234
  seq_length ...................................... 2048
  sgd_momentum .................................... 0.9
  short_seq_prob .................................. 0.1
  split ........................................... 949,50,1
  split_transformers .............................. False
  synchronize_each_layer .......................... False
  tensor_model_parallel_size ...................... 4
  tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/tr6-1B3-prefix-lm-unbiased-loss-logs/tensorboard
  tensorboard_log_interval ........................ 1
  tensorboard_queue_size .......................... 5
  tile_factor ..................................... 1
  titles_data_path ................................ None
  tokenizer_name_or_path .......................... None
  tokenizer_type .................................. GPT2BPETokenizer
  train_iters ..................................... None
  train_samples ................................... 73242187
  train_tokens .................................... None
  use_bnb_optimizer ............................... False
  use_checkpoint_lr_scheduler ..................... False
  use_contiguous_buffers_in_ddp ................... False
  use_cpu_initialization .......................... None
  use_one_sent_docs ............................... False
  use_pin_memory .................................. False
  virtual_pipeline_model_parallel_size ............ None
  vocab_extra_ids ................................. 0
  vocab_file ...................................... /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/data/gpt2-vocab.json
  weight_decay .................................... 0.1
  world_size ...................................... 64
  zero_allgather_bucket_size ...................... 0.0
  zero_contigious_gradients ....................... False
  zero_reduce_bucket_size ......................... 0.0
  zero_reduce_scatter ............................. False
  zero_stage ...................................... 1
-------------------- end of arguments ---------------------
will use batch size rampup starting from global batch size 32 to global batch size 512 with batch size increments 32 over 2000000 samples.
> building GPT2BPETokenizer tokenizer ...
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ******** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****

**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
 > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688)
> initializing torch distributed ...
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ******** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****

**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ******** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****

**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
> setting tensorboard ...
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
> initializing tensor model parallel with size 4
> initializing pipeline model parallel with size 4
> setting random seeds to 1234 ...
[2021-11-01 17:38:42,115] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
> compiling dataset index builder ...
make: Entering directory '/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/data'
g++ -O3 -Wall -shared -std=c++11 -fPIC -fdiagnostics-color -I/gpfswork/rech/six/commun/conda/thomas_prefix_lm/include/python3.8 -I/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/pybind11/include helpers.cpp -o helpers.cpython-38-x86_64-linux-gnu.so
make: Leaving directory '/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/data'
>>> done with dataset index builder. Compilation time: 6.236 seconds
> compiling and loading fused kernels ...
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_upper_triang_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] c++ -MMD -MF scaled_upper_triang_masked_softmax.o.d -DTORCH_EXTENSION_NAME=scaled_upper_triang_masked_softmax_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/TH -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/THC -isystem /gpfslocalsys/cuda/11.2/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -c /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.cpp -o scaled_upper_triang_masked_softmax.o 
[2/3] /gpfslocalsys/cuda/11.2/bin/nvcc --generate-dependencies-with-compile --dependency-output scaled_upper_triang_masked_softmax_cuda.cuda.o.d -DTORCH_EXTENSION_NAME=scaled_upper_triang_masked_softmax_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/TH -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/THC -isystem /gpfslocalsys/cuda/11.2/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 -gencode arch=compute_70,code=sm_70 --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda -gencode arch=compute_80,code=sm_80 -std=c++14 -c /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_cuda.cu -o scaled_upper_triang_masked_softmax_cuda.cuda.o 
[3/3] c++ scaled_upper_triang_masked_softmax.o scaled_upper_triang_masked_softmax_cuda.cuda.o -shared -L/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/gpfslocalsys/cuda/11.2/lib64 -lcudart -o scaled_upper_triang_masked_softmax_cuda.so
Loading extension module scaled_upper_triang_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] c++ -MMD -MF scaled_masked_softmax.o.d -DTORCH_EXTENSION_NAME=scaled_masked_softmax_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/TH -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/THC -isystem /gpfslocalsys/cuda/11.2/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -c /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.cpp -o scaled_masked_softmax.o 
[2/3] /gpfslocalsys/cuda/11.2/bin/nvcc --generate-dependencies-with-compile --dependency-output scaled_masked_softmax_cuda.cuda.o.d -DTORCH_EXTENSION_NAME=scaled_masked_softmax_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/TH -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/THC -isystem /gpfslocalsys/cuda/11.2/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 -gencode arch=compute_70,code=sm_70 --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda -gencode arch=compute_80,code=sm_80 -std=c++14 -c /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_cuda.cu -o scaled_masked_softmax_cuda.cuda.o 
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h(320): warning: variable "batch_count" was declared but never referenced

/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h(320): warning: variable "batch_count" was declared but never referenced

[3/3] c++ scaled_masked_softmax.o scaled_masked_softmax_cuda.cuda.o -shared -L/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/gpfslocalsys/cuda/11.2/lib64 -lcudart -o scaled_masked_softmax_cuda.so
Loading extension module scaled_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module fused_mix_prec_layer_norm_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] c++ -MMD -MF layer_norm_cuda.o.d -DTORCH_EXTENSION_NAME=fused_mix_prec_layer_norm_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/TH -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/THC -isystem /gpfslocalsys/cuda/11.2/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -c /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp -o layer_norm_cuda.o 
[2/3] /gpfslocalsys/cuda/11.2/bin/nvcc --generate-dependencies-with-compile --dependency-output layer_norm_cuda_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=fused_mix_prec_layer_norm_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/TH -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/THC -isystem /gpfslocalsys/cuda/11.2/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 -gencode arch=compute_70,code=sm_70 --use_fast_math -maxrregcount=50 -gencode arch=compute_80,code=sm_80 -std=c++14 -c /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda_kernel.cu -o layer_norm_cuda_kernel.cuda.o 
[3/3] c++ layer_norm_cuda.o layer_norm_cuda_kernel.cuda.o -shared -L/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/gpfslocalsys/cuda/11.2/lib64 -lcudart -o fused_mix_prec_layer_norm_cuda.so
Loading extension module fused_mix_prec_layer_norm_cuda...
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
>>> done with compiling and loading fused kernels. Compilation time: 162.296 seconds
time to initialize megatron (seconds): 186.731
[after megatron is initialized] datetime: 2021-11-01 17:41:30 
building GPT model ...
[2021-11-01 17:41:30,781] [INFO] [utils.py:806:see_memory_usage] Before Building Model
[2021-11-01 17:41:30,782] [INFO] [utils.py:807:see_memory_usage] MA 0.0 GB         Max_MA 0.0 GB         CA 0.0 GB         Max_CA 0 GB 
[2021-11-01 17:41:30,782] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory:  used = 40.57 GB, percent = 21.7%
SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None
Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=1, data=0, model=0): 16, ProcessCoord(pipe=1, data=0, model=1): 17, ProcessCoord(pipe=1, data=0, model=2): 18, ProcessCoord(pipe=1, data=0, model=3): 19, ProcessCoord(pipe=1, data=1, model=0): 20, ProcessCoord(pipe=1, data=1, model=1): 21, ProcessCoord(pipe=1, data=1, model=2): 22, ProcessCoord(pipe=1, data=1, model=3): 23, ProcessCoord(pipe=1, data=2, model=0): 24, ProcessCoord(pipe=1, data=2, model=1): 25, ProcessCoord(pipe=1, data=2, model=2): 26, ProcessCoord(pipe=1, data=2, model=3): 27, ProcessCoord(pipe=1, data=3, model=0): 28, ProcessCoord(pipe=1, data=3, model=1): 29, ProcessCoord(pipe=1, data=3, model=2): 30, ProcessCoord(pipe=1, data=3, model=3): 31, ProcessCoord(pipe=2, data=0, model=0): 32, ProcessCoord(pipe=2, data=0, model=1): 33, ProcessCoord(pipe=2, data=0, model=2): 34, ProcessCoord(pipe=2, data=0, model=3): 35, ProcessCoord(pipe=2, data=1, model=0): 36, ProcessCoord(pipe=2, data=1, model=1): 37, ProcessCoord(pipe=2, data=1, model=2): 38, ProcessCoord(pipe=2, data=1, model=3): 39, ProcessCoord(pipe=2, data=2, model=0): 40, ProcessCoord(pipe=2, data=2, model=1): 41, ProcessCoord(pipe=2, data=2, model=2): 42, ProcessCoord(pipe=2, data=2, model=3): 43, ProcessCoord(pipe=2, data=3, model=0): 44, ProcessCoord(pipe=2, data=3, model=1): 45, ProcessCoord(pipe=2, data=3, model=2): 46, ProcessCoord(pipe=2, data=3, model=3): 47, ProcessCoord(pipe=3, data=0, model=0): 48, ProcessCoord(pipe=3, data=0, model=1): 49, ProcessCoord(pipe=3, data=0, model=2): 50, ProcessCoord(pipe=3, data=0, model=3): 51, ProcessCoord(pipe=3, data=1, model=0): 52, ProcessCoord(pipe=3, data=1, model=1): 53, ProcessCoord(pipe=3, data=1, model=2): 54, ProcessCoord(pipe=3, data=1, model=3): 55, ProcessCoord(pipe=3, data=2, model=0): 56, ProcessCoord(pipe=3, data=2, model=1): 57, ProcessCoord(pipe=3, data=2, model=2): 58, ProcessCoord(pipe=3, data=2, model=3): 59, ProcessCoord(pipe=3, data=3, model=0): 60, ProcessCoord(pipe=3, data=3, model=1): 61, ProcessCoord(pipe=3, data=3, model=2): 62, ProcessCoord(pipe=3, data=3, model=3): 63}
[2021-11-01 17:41:31,304] [INFO] [module.py:365:_partition_layers] Partitioning pipeline stages with method type:transformer
stage=0 layers=9
     0: _to_float16
     1: EmbeddingPipe
     2: <lambda>
     3: ParallelTransformerLayerPipe
     4: ParallelTransformerLayerPipe
     5: ParallelTransformerLayerPipe
     6: ParallelTransformerLayerPipe
     7: ParallelTransformerLayerPipe
     8: ParallelTransformerLayerPipe
stage=1 layers=6
     9: ParallelTransformerLayerPipe
    10: ParallelTransformerLayerPipe
    11: ParallelTransformerLayerPipe
    12: ParallelTransformerLayerPipe
    13: ParallelTransformerLayerPipe
    14: ParallelTransformerLayerPipe
stage=2 layers=6
    15: ParallelTransformerLayerPipe
    16: ParallelTransformerLayerPipe
    17: ParallelTransformerLayerPipe
    18: ParallelTransformerLayerPipe
    19: ParallelTransformerLayerPipe
    20: ParallelTransformerLayerPipe
stage=3 layers=11
    21: ParallelTransformerLayerPipe
    22: ParallelTransformerLayerPipe
    23: ParallelTransformerLayerPipe
    24: ParallelTransformerLayerPipe
    25: ParallelTransformerLayerPipe
    26: ParallelTransformerLayerPipe
    27: <lambda>
    28: <lambda>
    29: MixedFusedLayerNorm
    30: EmbeddingPipe
    31: float16_to_fp32
  loss: CrossEntropy
 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 75592704

 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 105743360
 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 105743360
 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 105739264
 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 105739264
 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 105743360 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 105743360

 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 105739264
[2021-11-01 17:41:31,677] [INFO] [utils.py:806:see_memory_usage] After Building Model
[2021-11-01 17:41:31,678] [INFO] [utils.py:807:see_memory_usage] MA 0.21 GB         Max_MA 0.21 GB         CA 0.22 GB         Max_CA 0 GB 
[2021-11-01 17:41:31,678] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory:  used = 40.82 GB, percent = 21.8%
 > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 105739264
setting training iterations to 152972
> learning rate decay style: cosine
DeepSpeed is enabled.
[2021-11-01 17:41:31,697] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.5.5+58a8e13, git-hash=58a8e13, git-branch=master
[2021-11-01 17:41:31,770] [INFO] [engine.py:207:__init__] DeepSpeed Flops Profiler Enabled: False
[2021-11-01 17:41:31,770] [INFO] [engine.py:862:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer
[2021-11-01 17:41:31,770] [INFO] [engine.py:868:_configure_optimizer] Using client Optimizer as basic optimizer
[2021-11-01 17:41:31,771] [INFO] [engine.py:884:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam
[2021-11-01 17:41:31,772] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type=<class 'apex.optimizers.fused_adam.FusedAdam'>
[2021-11-01 17:41:31,772] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer
[2021-11-01 17:41:31,772] [INFO] [stage2.py:111:__init__] Reduce bucket size 500000000
[2021-11-01 17:41:31,772] [INFO] [stage2.py:112:__init__] Allgather bucket size 500000000
[2021-11-01 17:41:31,772] [INFO] [stage2.py:113:__init__] CPU Offload: False
[2021-11-01 17:41:31,772] [INFO] [stage2.py:114:__init__] Round robin gradient partitioning: False
Rank: 39 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 33 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 46 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 32 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 37 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 45 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 44 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 28 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 35 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 36 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 47 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 43 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 41 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 30 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 31 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 29 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 21 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 23 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 

Rank: 22 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 16 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 17 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 20 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 34 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 24 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 26 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 27 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 40 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 25 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 18 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 19 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 38 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 42 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 15 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 60 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 52 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 13 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 1 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 2 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 14 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 48 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 51 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 11 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 55 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 9 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 10 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 56 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 59 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 5 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 6 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 50 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 63 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 3 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 49 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 57 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 7 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 61 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 53 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 4 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 0 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 8 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 54 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 62 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 58 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 12 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
[2021-11-01 17:41:32,076] [INFO] [utils.py:806:see_memory_usage] Before initializing optimizer states
[2021-11-01 17:41:32,077] [INFO] [utils.py:807:see_memory_usage] MA 0.3 GB         Max_MA 0.35 GB         CA 0.59 GB         Max_CA 1 GB 
[2021-11-01 17:41:32,077] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory:  used = 41.54 GB, percent = 22.2%
[2021-11-01 17:41:32,104] [INFO] [utils.py:806:see_memory_usage] After initializing optimizer states
[2021-11-01 17:41:32,104] [INFO] [utils.py:807:see_memory_usage] MA 0.49 GB         Max_MA 0.59 GB         CA 0.89 GB         Max_CA 1 GB 
[2021-11-01 17:41:32,105] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory:  used = 41.54 GB, percent = 22.2%
[2021-11-01 17:41:32,105] [INFO] [stage2.py:474:__init__] optimizer state initialized
[2021-11-01 17:41:32,128] [INFO] [utils.py:806:see_memory_usage] After initializing ZeRO optimizer
[2021-11-01 17:41:32,129] [INFO] [utils.py:807:see_memory_usage] MA 0.49 GB         Max_MA 0.49 GB         CA 0.89 GB         Max_CA 1 GB 
[2021-11-01 17:41:32,129] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory:  used = 41.54 GB, percent = 22.2%
[2021-11-01 17:41:32,129] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam
[2021-11-01 17:41:32,129] [INFO] [engine.py:599:_configure_lr_scheduler] DeepSpeed using client LR scheduler
[2021-11-01 17:41:32,129] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = <megatron.learning_rates.AnnealingLR object at 0x14c4ce798e20>
[2021-11-01 17:41:32,129] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)]
[2021-11-01 17:41:32,129] [INFO] [config.py:940:print] DeepSpeedEngine configuration:
[2021-11-01 17:41:32,129] [INFO] [config.py:944:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2021-11-01 17:41:32,129] [INFO] [config.py:944:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2021-11-01 17:41:32,130] [INFO] [config.py:944:print]   allreduce_always_fp32 ........ False
[2021-11-01 17:41:32,130] [INFO] [config.py:944:print]   amp_enabled .................. False
[2021-11-01 17:41:32,130] [INFO] [config.py:944:print]   amp_params ................... False
[2021-11-01 17:41:32,130] [INFO] [config.py:944:print]   checkpoint_tag_validation_enabled  True
[2021-11-01 17:41:32,130] [INFO] [config.py:944:print]   checkpoint_tag_validation_fail  False
[2021-11-01 17:41:32,130] [INFO] [config.py:944:print]   curriculum_enabled ........... False
[2021-11-01 17:41:32,130] [INFO] [config.py:944:print]   curriculum_params ............ False
[2021-11-01 17:41:32,130] [INFO] [config.py:944:print]   dataloader_drop_last ......... False
[2021-11-01 17:41:32,130] [INFO] [config.py:944:print]   disable_allgather ............ False
[2021-11-01 17:41:32,130] [INFO] [config.py:944:print]   dump_state ................... False
[2021-11-01 17:41:32,130] [INFO] [config.py:944:print]   dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1}
[2021-11-01 17:41:32,130] [INFO] [config.py:944:print]   eigenvalue_enabled ........... False
[2021-11-01 17:41:32,130] [INFO] [config.py:944:print]   eigenvalue_gas_boundary_resolution  1
[2021-11-01 17:41:32,130] [INFO] [config.py:944:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2021-11-01 17:41:32,130] [INFO] [config.py:944:print]   eigenvalue_layer_num ......... 0
[2021-11-01 17:41:32,130] [INFO] [config.py:944:print]   eigenvalue_max_iter .......... 100
[2021-11-01 17:41:32,130] [INFO] [config.py:944:print]   eigenvalue_stability ......... 1e-06
[2021-11-01 17:41:32,130] [INFO] [config.py:944:print]   eigenvalue_tol ............... 0.01
[2021-11-01 17:41:32,130] [INFO] [config.py:944:print]   eigenvalue_verbose ........... False
[2021-11-01 17:41:32,130] [INFO] [config.py:944:print]   elasticity_enabled ........... False
[2021-11-01 17:41:32,130] [INFO] [config.py:944:print]   flops_profiler_config ........ {
    "enabled": false, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2021-11-01 17:41:32,130] [INFO] [config.py:944:print]   fp16_enabled ................. True
[2021-11-01 17:41:32,130] [INFO] [config.py:944:print]   fp16_master_weights_and_gradients  False
[2021-11-01 17:41:32,130] [INFO] [config.py:944:print]   fp16_mixed_quantize .......... False
[2021-11-01 17:41:32,130] [INFO] [config.py:944:print]   global_rank .................. 0
[2021-11-01 17:41:32,130] [INFO] [config.py:944:print]   gradient_accumulation_steps .. 16
[2021-11-01 17:41:32,130] [INFO] [config.py:944:print]   gradient_clipping ............ 1.0
[2021-11-01 17:41:32,130] [INFO] [config.py:944:print]   gradient_predivide_factor .... 1.0
[2021-11-01 17:41:32,130] [INFO] [config.py:944:print]   initial_dynamic_scale ........ 4096
[2021-11-01 17:41:32,130] [INFO] [config.py:944:print]   loss_scale ................... 0
[2021-11-01 17:41:32,130] [INFO] [config.py:944:print]   memory_breakdown ............. False
[2021-11-01 17:41:32,130] [INFO] [config.py:944:print]   optimizer_legacy_fusion ...... False
[2021-11-01 17:41:32,130] [INFO] [config.py:944:print]   optimizer_name ............... None
[2021-11-01 17:41:32,130] [INFO] [config.py:944:print]   optimizer_params ............. None
[2021-11-01 17:41:32,130] [INFO] [config.py:944:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2021-11-01 17:41:32,130] [INFO] [config.py:944:print]   pld_enabled .................. False
[2021-11-01 17:41:32,130] [INFO] [config.py:944:print]   pld_params ................... False
[2021-11-01 17:41:32,131] [INFO] [config.py:944:print]   prescale_gradients ........... False
[2021-11-01 17:41:32,131] [INFO] [config.py:944:print]   quantize_change_rate ......... 0.001
[2021-11-01 17:41:32,131] [INFO] [config.py:944:print]   quantize_groups .............. 1
[2021-11-01 17:41:32,131] [INFO] [config.py:944:print]   quantize_offset .............. 1000
[2021-11-01 17:41:32,131] [INFO] [config.py:944:print]   quantize_period .............. 1000
[2021-11-01 17:41:32,131] [INFO] [config.py:944:print]   quantize_rounding ............ 0
[2021-11-01 17:41:32,131] [INFO] [config.py:944:print]   quantize_start_bits .......... 16
[2021-11-01 17:41:32,131] [INFO] [config.py:944:print]   quantize_target_bits ......... 8
[2021-11-01 17:41:32,131] [INFO] [config.py:944:print]   quantize_training_enabled .... False
[2021-11-01 17:41:32,131] [INFO] [config.py:944:print]   quantize_type ................ 0
[2021-11-01 17:41:32,131] [INFO] [config.py:944:print]   quantize_verbose ............. False
[2021-11-01 17:41:32,131] [INFO] [config.py:944:print]   scheduler_name ............... None
[2021-11-01 17:41:32,131] [INFO] [config.py:944:print]   scheduler_params ............. None
[2021-11-01 17:41:32,131] [INFO] [config.py:944:print]   sparse_attention ............. None
[2021-11-01 17:41:32,131] [INFO] [config.py:944:print]   sparse_gradients_enabled ..... False
[2021-11-01 17:41:32,131] [INFO] [config.py:944:print]   steps_per_print .............. 2000
[2021-11-01 17:41:32,131] [INFO] [config.py:944:print]   tensorboard_enabled .......... False
[2021-11-01 17:41:32,131] [INFO] [config.py:944:print]   tensorboard_job_name ......... DeepSpeedJobName
[2021-11-01 17:41:32,131] [INFO] [config.py:944:print]   tensorboard_output_path ...... 
[2021-11-01 17:41:32,131] [INFO] [config.py:944:print]   train_batch_size ............. 512
[2021-11-01 17:41:32,131] [INFO] [config.py:944:print]   train_micro_batch_size_per_gpu  8
[2021-11-01 17:41:32,131] [INFO] [config.py:944:print]   use_quantizer_kernel ......... False
[2021-11-01 17:41:32,131] [INFO] [config.py:944:print]   wall_clock_breakdown ......... False
[2021-11-01 17:41:32,131] [INFO] [config.py:944:print]   world_size ................... 4
[2021-11-01 17:41:32,131] [INFO] [config.py:944:print]   zero_allow_untested_optimizer  False
[2021-11-01 17:41:32,131] [INFO] [config.py:944:print]   zero_config .................. {
    "stage": 1, 
    "contiguous_gradients": true, 
    "reduce_scatter": true, 
    "reduce_bucket_size": 5.000000e+08, 
    "allgather_partitions": true, 
    "allgather_bucket_size": 5.000000e+08, 
    "overlap_comm": false, 
    "load_from_fp32_weights": true, 
    "elastic_checkpoint": true, 
    "offload_param": null, 
    "offload_optimizer": null, 
    "sub_group_size": 1.000000e+09, 
    "prefetch_bucket_size": 5.000000e+07, 
    "param_persistence_threshold": 1.000000e+05, 
    "max_live_parameters": 1.000000e+09, 
    "max_reuse_distance": 1.000000e+09, 
    "gather_fp16_weights_on_model_save": false, 
    "ignore_unused_parameters": true, 
    "round_robin_gradients": false, 
    "legacy_stage1": false
}
[2021-11-01 17:41:32,131] [INFO] [config.py:944:print]   zero_enabled ................. True
[2021-11-01 17:41:32,131] [INFO] [config.py:944:print]   zero_optimization_stage ...... 1
[2021-11-01 17:41:32,131] [INFO] [config.py:946:print]   json = {
    "train_micro_batch_size_per_gpu": 8, 
    "train_batch_size": 512, 
    "gradient_clipping": 1.0, 
    "zero_optimization": {
        "stage": 1
    }, 
    "fp16": {
        "enabled": true, 
        "loss_scale": 0, 
        "loss_scale_window": 500, 
        "hysteresis": 2, 
        "min_loss_scale": 1, 
        "initial_scale_power": 12
    }, 
    "steps_per_print": 2.000000e+03, 
    "wall_clock_breakdown": false
}
[2021-11-01 17:41:32,132] [INFO] [engine.py:93:__init__] CONFIG: micro_batches=16 micro_batch_size=8
[2021-11-01 17:41:32,423] [INFO] [engine.py:151:__init__] RANK=0 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=105739264 (105.739M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-01 17:41:32,423] [INFO] [engine.py:151:__init__] RANK=1 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=105739264 (105.739M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-01 17:41:32,423] [INFO] [engine.py:151:__init__] RANK=2 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=105739264 (105.739M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-01 17:41:32,423] [INFO] [engine.py:151:__init__] RANK=3 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=105739264 (105.739M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-01 17:41:32,424] [INFO] [engine.py:151:__init__] RANK=33 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-01 17:41:32,424] [INFO] [engine.py:151:__init__] RANK=34 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-01 17:41:32,424] [INFO] [engine.py:151:__init__] RANK=32 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-01 17:41:32,424] [INFO] [engine.py:151:__init__] RANK=35 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-01 17:41:32,424] [INFO] [engine.py:151:__init__] RANK=51 STAGE=3 LAYERS=11 [21, 32) STAGE_PARAMS=105743360 (105.743M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-01 17:41:32,424] [INFO] [engine.py:151:__init__] RANK=49 STAGE=3 LAYERS=11 [21, 32) STAGE_PARAMS=105743360 (105.743M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-01 17:41:32,424] [INFO] [engine.py:151:__init__] RANK=48 STAGE=3 LAYERS=11 [21, 32) STAGE_PARAMS=105743360 (105.743M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-01 17:41:32,424] [INFO] [engine.py:151:__init__] RANK=50 STAGE=3 LAYERS=11 [21, 32) STAGE_PARAMS=105743360 (105.743M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-01 17:41:32,424] [INFO] [engine.py:151:__init__] RANK=16 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-01 17:41:32,424] [INFO] [engine.py:151:__init__] RANK=18 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-01 17:41:32,424] [INFO] [engine.py:151:__init__] RANK=19 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-01 17:41:32,424] [INFO] [engine.py:151:__init__] RANK=17 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-01 17:41:32,509] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,509] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,509] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,509] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
WARNING: could not find the metadata file /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints 
    will not load any checkpoints and will start from random
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,509] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
time (ms) | load-checkpoint: 1.12
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.691828224
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.691828224estimated model parameters: 1.691828224

/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264estimated model parameters: 1.209483264

estimated model parameters: 1.69189376estimated model parameters: 1.69189376

estimated model parameters: 1.691828224
estimated model parameters: 1.209483264
estimated model parameters: 1.69189376
estimated model parameters: 1.209483264
estimated model parameters: 1.69189376
estimated model parameters: 1.209483264estimated model parameters: 1.209483264

estimated model parameters: 1.209483264
estimated model parameters: 1.691828224estimated model parameters: 1.691828224

estimated model parameters: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.691828224
estimated model parameters: 1.691828224
estimated model parameters: 1.209483264
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488

/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.2095488
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488

/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264estimated model parameters: 1.209483264

/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.69189376
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264estimated model parameters: 1.209483264

/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.69189376estimated model parameters: 1.69189376

/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264estimated model parameters: 1.209483264

/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.69189376
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.691828224estimated model parameters: 1.691828224estimated model parameters: 1.691828224estimated model parameters: 1.691828224


estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.69189376
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.691828224
estimated model parameters: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters: 1.69189376
estimated model parameters: 1.691828224
estimated model parameters: 1.691828224
estimated model parameters: 1.691828224
estimated model parameters: 1.69189376estimated model parameters: 1.69189376

estimated model parameters: 1.209483264
estimated model parameters: 1.69189376
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.69189376
estimated model parameters: 1.69189376
estimated model parameters without embeddings: 1.2095488
estimated model parameters: 1.69189376
estimated model parameters: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488

estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264


estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
[after model, optimizer, and learning rate scheduler are built] datetime: 2021-11-01 17:41:32 
> building train, validation, and test datasets ...
 > datasets target sizes (minimum size):
    train:      73242187
    validation: 7833600
    test:       51200
> building train, validation, and test datasets for GPT ...
 > building dataset index ...
    reading sizes...
    reading pointers...
    reading document index...
    creating numpy buffer of mmap...
    creating memory view of numpy buffer...
 > finished creating indexed dataset in 0.143529 seconds
    number of documents: 304230423
 > dataset split:
    train:
     document indices in [0, 288714672) total of 288714672 documents
    validation:
     document indices in [288714672, 303926193) total of 15211521 documents
    test:
     document indices in [303926193, 304230423) total of 304230 documents
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.422 seconds
    total number of samples: 131537224
    total number of epochs: 1
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.328 seconds
    total number of samples: 13854322
    total number of epochs: 2
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.073 seconds
    total number of samples: 137384
    total number of epochs: 1
> finished creating GPT datasets ...
[after dataloaders are built] datetime: 2021-11-01 17:41:39 
done with setup ...
training ...
time (ms) | model-and-optimizer-setup: 1777.43 | train/valid/test-data-iterators-setup: 5796.00
Number of parameters: 1.691828224 billion
Number of parameters: 1.691828224 billionNumber of parameters: 1.691828224 billion

Number of parameters: 1.209483264 billion
Number of parameters: 1.69189376 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.69189376 billionNumber of parameters: 1.69189376 billion

Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion


Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.691828224 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters: 1.69189376 billion
Number of parameters without embeddings: 1.209483264 billionNumber of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion

Number of parameters without embeddings: 1.2095488 billionNumber of parameters without embeddings: 1.2095488 billion

Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion


Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
[before the start of training step] datetime: 2021-11-01 17:41:39 
[2021-11-01 17:41:39,970] [INFO] [checkpointing.py:547:forward] Activation Checkpointing Information
[2021-11-01 17:41:39,970] [INFO] [checkpointing.py:548:forward] ----Partition Activations False, CPU CHECKPOINTING False
[2021-11-01 17:41:39,970] [INFO] [checkpointing.py:551:forward] ----contiguous Memory Checkpointing False with 24 total layers
[2021-11-01 17:41:39,970] [INFO] [checkpointing.py:554:forward] ----Synchronization False
[2021-11-01 17:41:39,970] [INFO] [checkpointing.py:555:forward] ----Profiling time in checkpointing False
[Rank 17] (after 200 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 2448.73974609375 | reserved: 3814.0 | max reserved: 3814.0
[Rank 1] (after 200 iterations) memory (MB) | allocated: 552.95849609375 | max allocated: 2609.5048828125 | reserved: 4126.0 | max reserved: 4126.0
[Rank 49] (after 200 iterations) memory (MB) | allocated: 1472.916015625 | max allocated: 3314.6748046875 | reserved: 5210.0 | max reserved: 5210.0
[Rank 19] (after 200 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 2448.73974609375 | reserved: 3862.0 | max reserved: 3862.0
[Rank 3] (after 200 iterations) memory (MB) | allocated: 552.95849609375 | max allocated: 2609.5048828125 | reserved: 4238.0 | max reserved: 4238.0
[Rank 51] (after 200 iterations) memory (MB) | allocated: 1472.916015625 | max allocated: 3314.6748046875 | reserved: 6210.0 | max reserved: 6210.0
[Rank 35] (after 200 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 2448.73974609375 | reserved: 3862.0 | max reserved: 3862.0
[Rank 48] (after 200 iterations) memory (MB) | allocated: 1472.916015625 | max allocated: 3314.6748046875 | reserved: 6194.0 | max reserved: 6194.0
[Rank 2] (after 200 iterations) memory (MB) | allocated: 552.95849609375 | max allocated: 2609.5048828125 | reserved: 4238.0 | max reserved: 4238.0
[Rank 18] (after 200 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 2448.73974609375 | reserved: 3862.0 | max reserved: 3862.0
[Rank 50] (after 200 iterations) memory (MB) | allocated: 1472.916015625 | max allocated: 3314.6748046875 | reserved: 5338.0 | max reserved: 5338.0
[Rank 34] (after 200 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 2448.73974609375 | reserved: 3862.0 | max reserved: 3862.0
 iteration      200/  152972 | consumed samples:         6400 | consumed tokens:     13107200 | elapsed time per iteration (ms): 1327.3 | learning rate: 6.991E-06 | global batch size:    32 | lm loss: 8.590840E+00 | loss scale: 4096.0 | grad norm: 8688.322 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
[Rank 16] (after 200 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 2448.73974609375 | reserved: 3894.0 | max reserved: 3894.0
[Rank 0] (after 200 iterations) memory (MB) | allocated: 552.95849609375 | max allocated: 2609.5048828125 | reserved: 4222.0 | max reserved: 4222.0
[Rank 33] (after 200 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 2448.73974609375 | reserved: 3894.0 | max reserved: 3894.0
[Rank 32] (after 200 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 2448.73974609375 | reserved: 3814.0 | max reserved: 3814.0
 iteration      400/  152972 | consumed samples:        12800 | consumed tokens:     26214400 | elapsed time per iteration (ms): 1267.2 | learning rate: 1.398E-05 | global batch size:    32 | lm loss: 7.146135E+00 | loss scale: 4096.0 | grad norm: 4570.467 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration      600/  152972 | consumed samples:        19200 | consumed tokens:     39321600 | elapsed time per iteration (ms): 1278.3 | learning rate: 2.097E-05 | global batch size:    32 | lm loss: 6.750924E+00 | loss scale: 8192.0 | grad norm: 7901.354 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration      800/  152972 | consumed samples:        25600 | consumed tokens:     52428800 | elapsed time per iteration (ms): 1276.6 | learning rate: 2.796E-05 | global batch size:    32 | lm loss: 6.545265E+00 | loss scale: 8192.0 | grad norm: 7316.658 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration     1000/  152972 | consumed samples:        32000 | consumed tokens:     65536000 | elapsed time per iteration (ms): 1276.7 | learning rate: 3.495E-05 | global batch size:    32 | lm loss: 6.260021E+00 | loss scale: 16384.0 | grad norm: 7908.266 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
------------------------------------------------------------------------------------------------
 validation loss at iteration 1000 | lm loss value: 6.375450E+00 | lm loss PPL: 5.872494E+02 | 
------------------------------------------------------------------------------------------------
 iteration     1200/  152972 | consumed samples:        38400 | consumed tokens:     78643200 | elapsed time per iteration (ms): 1447.9 | learning rate: 4.194E-05 | global batch size:    32 | lm loss: 6.374382E+00 | loss scale: 16384.0 | grad norm: 8613.462 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration     1400/  152972 | consumed samples:        44800 | consumed tokens:     91750400 | elapsed time per iteration (ms): 1279.1 | learning rate: 4.893E-05 | global batch size:    32 | lm loss: 6.279016E+00 | loss scale: 16384.0 | grad norm: 14029.269 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
saving checkpoint at iteration    1500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
[2021-11-01 18:14:19,464] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/mp_rank_00_model_states.pt
[2021-11-01 18:14:19,477] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/mp_rank_01_model_states.pt
[2021-11-01 18:14:19,850] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_2_mp_rank_05_optim_states.pt
[2021-11-01 18:14:19,854] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_0_mp_rank_04_optim_states.pt
[2021-11-01 18:14:19,856] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_0_mp_rank_09_optim_states.pt
[2021-11-01 18:14:19,856] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_3_mp_rank_08_optim_states.pt
[2021-11-01 18:14:19,859] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_0_mp_rank_06_optim_states.pt
[2021-11-01 18:14:19,860] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_2_mp_rank_04_optim_states.pt
[2021-11-01 18:14:19,861] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_0_mp_rank_08_optim_states.pt
[2021-11-01 18:14:19,861] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_3_mp_rank_09_optim_states.pt
[2021-11-01 18:14:19,867] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_1_mp_rank_11_optim_states.pt
[2021-11-01 18:14:19,867] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_2_mp_rank_08_optim_states.pt
[2021-11-01 18:14:19,869] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_1_mp_rank_09_optim_states.pt
[2021-11-01 18:14:19,872] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_1_mp_rank_05_optim_states.pt
[2021-11-01 18:14:19,872] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_1_mp_rank_04_optim_states.pt
[2021-11-01 18:14:19,873] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_3_mp_rank_04_optim_states.pt
[2021-11-01 18:14:19,878] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_2_mp_rank_07_optim_states.pt
[2021-11-01 18:14:19,878] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_2_mp_rank_06_optim_states.pt
[2021-11-01 18:14:19,878] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_3_mp_rank_10_optim_states.pt
[2021-11-01 18:14:19,880] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_3_mp_rank_11_optim_states.pt
[2021-11-01 18:14:19,881] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_0_mp_rank_07_optim_states.pt
[2021-11-01 18:14:19,882] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_0_mp_rank_11_optim_states.pt
[2021-11-01 18:14:19,882] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_0_mp_rank_05_optim_states.pt
[2021-11-01 18:14:19,882] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_0_mp_rank_10_optim_states.pt
[2021-11-01 18:14:19,884] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_3_mp_rank_05_optim_states.pt
[2021-11-01 18:14:19,885] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_2_mp_rank_10_optim_states.pt
[2021-11-01 18:14:19,888] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_1_mp_rank_08_optim_states.pt
[2021-11-01 18:14:19,890] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_1_mp_rank_07_optim_states.pt
[2021-11-01 18:14:19,892] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_1_mp_rank_06_optim_states.pt
[2021-11-01 18:14:19,892] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_1_mp_rank_10_optim_states.pt
[2021-11-01 18:14:19,899] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_2_mp_rank_11_optim_states.pt
[2021-11-01 18:14:19,906] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_2_mp_rank_09_optim_states.pt
[2021-11-01 18:14:19,906] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_3_mp_rank_06_optim_states.pt
[2021-11-01 18:14:19,911] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_3_mp_rank_07_optim_states.pt
[2021-11-01 18:14:19,986] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_0_mp_rank_14_optim_states.pt
[2021-11-01 18:14:19,993] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_0_mp_rank_13_optim_states.pt
[2021-11-01 18:14:20,005] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_0_mp_rank_03_optim_states.pt
[2021-11-01 18:14:20,006] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_2_mp_rank_03_optim_states.pt
[2021-11-01 18:14:20,007] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_3_mp_rank_01_optim_states.pt
[2021-11-01 18:14:20,007] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_3_mp_rank_12_optim_states.pt
[2021-11-01 18:14:20,008] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_3_mp_rank_14_optim_states.pt
[2021-11-01 18:14:20,008] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_3_mp_rank_02_optim_states.pt
[2021-11-01 18:14:20,012] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_2_mp_rank_02_optim_states.pt
[2021-11-01 18:14:20,019] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_2_mp_rank_14_optim_states.pt
[2021-11-01 18:14:20,020] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_0_mp_rank_12_optim_states.pt
[2021-11-01 18:14:20,021] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_2_mp_rank_13_optim_states.pt
[2021-11-01 18:14:20,026] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_1_mp_rank_01_optim_states.pt
[2021-11-01 18:14:20,026] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_1_mp_rank_00_optim_states.pt
[2021-11-01 18:14:20,026] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_1_mp_rank_15_optim_states.pt
[2021-11-01 18:14:20,026] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_1_mp_rank_14_optim_states.pt
[2021-11-01 18:14:20,029] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_0_mp_rank_01_optim_states.pt
[2021-11-01 18:14:20,029] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_0_mp_rank_15_optim_states.pt
[2021-11-01 18:14:20,032] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_0_mp_rank_02_optim_states.pt
[2021-11-01 18:14:20,033] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_0_mp_rank_00_optim_states.pt
[2021-11-01 18:14:20,039] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_3_mp_rank_03_optim_states.pt
[2021-11-01 18:14:20,040] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_3_mp_rank_15_optim_states.pt
[2021-11-01 18:14:20,048] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_2_mp_rank_01_optim_states.pt
[2021-11-01 18:14:20,048] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_1_mp_rank_03_optim_states.pt
[2021-11-01 18:14:20,048] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_2_mp_rank_12_optim_states.pt
[2021-11-01 18:14:20,051] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_3_mp_rank_00_optim_states.pt
[2021-11-01 18:14:20,053] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_2_mp_rank_00_optim_states.pt
[2021-11-01 18:14:20,054] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_1_mp_rank_12_optim_states.pt
[2021-11-01 18:14:20,056] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_1_mp_rank_13_optim_states.pt
[2021-11-01 18:14:20,058] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_2_mp_rank_15_optim_states.pt
[2021-11-01 18:14:20,061] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_1_mp_rank_02_optim_states.pt
[2021-11-01 18:14:20,063] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_3_mp_rank_13_optim_states.pt
  successfully saved checkpoint at iteration    1500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
time (ms) | save-checkpoint: 1206.98
 iteration     1600/  152972 | consumed samples:        51200 | consumed tokens:    104857600 | elapsed time per iteration (ms): 1299.0 | learning rate: 5.592E-05 | global batch size:    32 | lm loss: 6.404619E+00 | loss scale: 32768.0 | grad norm: 21620.593 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration     1800/  152972 | consumed samples:        57600 | consumed tokens:    117964800 | elapsed time per iteration (ms): 1286.6 | learning rate: 6.291E-05 | global batch size:    32 | lm loss: 5.957126E+00 | loss scale: 32768.0 | grad norm: 19930.566 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
[2021-11-01 18:25:04,050] [INFO] [logging.py:68:log_dist] [Rank 0] step=2000, skipped=0, lr=[6.990524562409547e-05, 6.990524562409547e-05], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 2000 loss: 5.5920 iter time (s): 0.001 samples/sec: 50642.485
 iteration     2000/  152972 | consumed samples:        64000 | consumed tokens:    131072000 | elapsed time per iteration (ms): 1284.4 | learning rate: 6.991E-05 | global batch size:    32 | lm loss: 5.872133E+00 | loss scale: 65536.0 | grad norm: 24708.158 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
------------------------------------------------------------------------------------------------
 validation loss at iteration 2000 | lm loss value: 5.820941E+00 | lm loss PPL: 3.372893E+02 | 
------------------------------------------------------------------------------------------------
 iteration     2200/  152972 | consumed samples:        70400 | consumed tokens:    144179200 | elapsed time per iteration (ms): 1456.7 | learning rate: 7.690E-05 | global batch size:    32 | lm loss: 5.789141E+00 | loss scale: 65536.0 | grad norm: 44878.845 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration     2400/  152972 | consumed samples:        76800 | consumed tokens:    157286400 | elapsed time per iteration (ms): 1288.4 | learning rate: 8.385E-05 | global batch size:    32 | lm loss: 5.601571E+00 | loss scale: 65536.0 | grad norm: 55717.682 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration     2600/  152972 | consumed samples:        83200 | consumed tokens:    170393600 | elapsed time per iteration (ms): 1298.1 | learning rate: 9.081E-05 | global batch size:    32 | lm loss: 5.861208E+00 | loss scale: 32768.0 | grad norm: 28652.355 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration     2800/  152972 | consumed samples:        89600 | consumed tokens:    183500800 | elapsed time per iteration (ms): 1292.5 | learning rate: 9.780E-05 | global batch size:    32 | lm loss: 5.622683E+00 | loss scale: 32768.0 | grad norm: 21679.658 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration     3000/  152972 | consumed samples:        96000 | consumed tokens:    196608000 | elapsed time per iteration (ms): 1297.3 | learning rate: 1.048E-04 | global batch size:    32 | lm loss: 5.453251E+00 | loss scale: 32768.0 | grad norm: 13109.309 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
------------------------------------------------------------------------------------------------
 validation loss at iteration 3000 | lm loss value: 5.292018E+00 | lm loss PPL: 1.987441E+02 | 
------------------------------------------------------------------------------------------------
saving checkpoint at iteration    3000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
[2021-11-01 18:47:43,971] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/mp_rank_01_model_states.pt
[2021-11-01 18:47:44,023] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/mp_rank_00_model_states.pt
[2021-11-01 18:47:44,384] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_3_mp_rank_09_optim_states.pt
[2021-11-01 18:47:44,384] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_0_mp_rank_04_optim_states.pt
[2021-11-01 18:47:44,388] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_2_mp_rank_05_optim_states.pt
[2021-11-01 18:47:44,389] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_1_mp_rank_11_optim_states.pt
[2021-11-01 18:47:44,389] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_2_mp_rank_04_optim_states.pt
[2021-11-01 18:47:44,392] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_0_mp_rank_09_optim_states.pt
[2021-11-01 18:47:44,393] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_3_mp_rank_08_optim_states.pt
[2021-11-01 18:47:44,394] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_3_mp_rank_04_optim_states.pt
[2021-11-01 18:47:44,394] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_2_mp_rank_09_optim_states.pt
[2021-11-01 18:47:44,394] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_1_mp_rank_10_optim_states.pt
[2021-11-01 18:47:44,395] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_1_mp_rank_05_optim_states.pt
[2021-11-01 18:47:44,398] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_1_mp_rank_04_optim_states.pt
[2021-11-01 18:47:44,398] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_2_mp_rank_08_optim_states.pt
[2021-11-01 18:47:44,399] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_0_mp_rank_07_optim_states.pt
[2021-11-01 18:47:44,399] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_0_mp_rank_08_optim_states.pt
[2021-11-01 18:47:44,404] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_3_mp_rank_05_optim_states.pt
[2021-11-01 18:47:44,412] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_1_mp_rank_09_optim_states.pt
[2021-11-01 18:47:44,413] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_3_mp_rank_10_optim_states.pt
[2021-11-01 18:47:44,413] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_2_mp_rank_11_optim_states.pt
[2021-11-01 18:47:44,413] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_0_mp_rank_06_optim_states.pt
[2021-11-01 18:47:44,416] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_1_mp_rank_08_optim_states.pt
[2021-11-01 18:47:44,418] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_3_mp_rank_11_optim_states.pt
[2021-11-01 18:47:44,418] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_2_mp_rank_10_optim_states.pt
[2021-11-01 18:47:44,419] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_0_mp_rank_10_optim_states.pt
[2021-11-01 18:47:44,419] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_0_mp_rank_11_optim_states.pt
[2021-11-01 18:47:44,420] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_0_mp_rank_05_optim_states.pt
[2021-11-01 18:47:44,421] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_3_mp_rank_06_optim_states.pt
[2021-11-01 18:47:44,423] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_2_mp_rank_06_optim_states.pt
[2021-11-01 18:47:44,423] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_2_mp_rank_07_optim_states.pt
[2021-11-01 18:47:44,423] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_1_mp_rank_07_optim_states.pt
[2021-11-01 18:47:44,424] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_3_mp_rank_07_optim_states.pt
[2021-11-01 18:47:44,425] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_1_mp_rank_06_optim_states.pt
[2021-11-01 18:47:44,515] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_1_mp_rank_00_optim_states.pt
[2021-11-01 18:47:44,515] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_2_mp_rank_03_optim_states.pt
[2021-11-01 18:47:44,518] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_3_mp_rank_00_optim_states.pt
[2021-11-01 18:47:44,522] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_0_mp_rank_15_optim_states.pt
[2021-11-01 18:47:44,522] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_2_mp_rank_02_optim_states.pt
[2021-11-01 18:47:44,523] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_0_mp_rank_03_optim_states.pt
[2021-11-01 18:47:44,523] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_2_mp_rank_15_optim_states.pt
[2021-11-01 18:47:44,524] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_1_mp_rank_02_optim_states.pt
[2021-11-01 18:47:44,524] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_3_mp_rank_01_optim_states.pt
[2021-11-01 18:47:44,527] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_3_mp_rank_13_optim_states.pt
[2021-11-01 18:47:44,528] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_1_mp_rank_13_optim_states.pt
[2021-11-01 18:47:44,528] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_1_mp_rank_15_optim_states.pt
[2021-11-01 18:47:44,533] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_3_mp_rank_14_optim_states.pt
[2021-11-01 18:47:44,538] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_2_mp_rank_12_optim_states.pt
[2021-11-01 18:47:44,551] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_0_mp_rank_13_optim_states.pt
[2021-11-01 18:47:44,551] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_3_mp_rank_03_optim_states.pt
[2021-11-01 18:47:44,552] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_0_mp_rank_02_optim_states.pt
[2021-11-01 18:47:44,552] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_3_mp_rank_02_optim_states.pt
[2021-11-01 18:47:44,556] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_3_mp_rank_15_optim_states.pt
[2021-11-01 18:47:44,557] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_2_mp_rank_13_optim_states.pt
[2021-11-01 18:47:44,557] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_2_mp_rank_01_optim_states.pt
[2021-11-01 18:47:44,560] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_1_mp_rank_01_optim_states.pt
[2021-11-01 18:47:44,561] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_3_mp_rank_12_optim_states.pt
[2021-11-01 18:47:44,562] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_2_mp_rank_14_optim_states.pt
[2021-11-01 18:47:44,563] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_0_mp_rank_01_optim_states.pt
[2021-11-01 18:47:44,565] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_0_mp_rank_12_optim_states.pt
[2021-11-01 18:47:44,565] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_1_mp_rank_03_optim_states.pt
[2021-11-01 18:47:44,566] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_0_mp_rank_14_optim_states.pt
[2021-11-01 18:47:44,568] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_2_mp_rank_00_optim_states.pt
[2021-11-01 18:47:44,568] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_1_mp_rank_14_optim_states.pt
[2021-11-01 18:47:44,582] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_1_mp_rank_12_optim_states.pt
[2021-11-01 18:47:44,605] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_0_mp_rank_00_optim_states.pt
  successfully saved checkpoint at iteration    3000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
time (ms) | save-checkpoint: 1185.16
 iteration     3200/  152972 | consumed samples:       102400 | consumed tokens:    209715200 | elapsed time per iteration (ms): 1462.9 | learning rate: 1.118E-04 | global batch size:    32 | lm loss: 5.273650E+00 | loss scale: 65536.0 | grad norm: 38824.414 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration     3400/  152972 | consumed samples:       108800 | consumed tokens:    222822400 | elapsed time per iteration (ms): 1286.4 | learning rate: 1.188E-04 | global batch size:    32 | lm loss: 4.597713E+00 | loss scale: 65536.0 | grad norm: 79233.728 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration     3600/  152972 | consumed samples:       115200 | consumed tokens:    235929600 | elapsed time per iteration (ms): 1296.7 | learning rate: 1.258E-04 | global batch size:    32 | lm loss: 3.693162E+00 | loss scale: 131072.0 | grad norm: 103393.553 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration     3800/  152972 | consumed samples:       121600 | consumed tokens:    249036800 | elapsed time per iteration (ms): 1291.4 | learning rate: 1.327E-04 | global batch size:    32 | lm loss: 3.533896E+00 | loss scale: 131072.0 | grad norm: 74243.012 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
[2021-11-01 19:09:15,794] [INFO] [logging.py:68:log_dist] [Rank 0] step=4000, skipped=4, lr=[0.00013967068075694273, 0.00013967068075694273], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 4000 loss: 1.7512 iter time (s): 0.001 samples/sec: 50460.970
 iteration     4000/  152972 | consumed samples:       128000 | consumed tokens:    262144000 | elapsed time per iteration (ms): 1288.4 | learning rate: 1.397E-04 | global batch size:    32 | lm loss: 3.362072E+00 | loss scale: 65536.0 | grad norm: 18497.022 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
------------------------------------------------------------------------------------------------
 validation loss at iteration 4000 | lm loss value: 3.268240E+00 | lm loss PPL: 2.626507E+01 | 
------------------------------------------------------------------------------------------------
 iteration     4200/  152972 | consumed samples:       135456 | consumed tokens:    277413888 | elapsed time per iteration (ms): 1508.5 | learning rate: 1.478E-04 | global batch size:    64 | lm loss: 3.377852E+00 | loss scale: 65536.0 | grad norm: 28091.974 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration     4400/  152972 | consumed samples:       148256 | consumed tokens:    303628288 | elapsed time per iteration (ms): 1604.1 | learning rate: 1.618E-04 | global batch size:    64 | lm loss: 3.219010E+00 | loss scale: 65536.0 | grad norm: 16919.968 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
saving checkpoint at iteration    4500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
[2021-11-01 19:22:21,049] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/mp_rank_01_model_states.pt
[2021-11-01 19:22:21,100] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/mp_rank_00_model_states.pt
[2021-11-01 19:22:21,456] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_1_mp_rank_10_optim_states.pt
[2021-11-01 19:22:21,460] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_2_mp_rank_05_optim_states.pt
[2021-11-01 19:22:21,462] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_1_mp_rank_11_optim_states.pt
[2021-11-01 19:22:21,462] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_0_mp_rank_04_optim_states.pt
[2021-11-01 19:22:21,462] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_0_mp_rank_05_optim_states.pt
[2021-11-01 19:22:21,463] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_2_mp_rank_09_optim_states.pt
[2021-11-01 19:22:21,465] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_3_mp_rank_09_optim_states.pt
[2021-11-01 19:22:21,465] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_3_mp_rank_08_optim_states.pt
[2021-11-01 19:22:21,466] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_0_mp_rank_09_optim_states.pt
[2021-11-01 19:22:21,466] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_1_mp_rank_05_optim_states.pt
[2021-11-01 19:22:21,466] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_2_mp_rank_08_optim_states.pt
[2021-11-01 19:22:21,468] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_3_mp_rank_06_optim_states.pt
[2021-11-01 19:22:21,468] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_3_mp_rank_07_optim_states.pt
[2021-11-01 19:22:21,470] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_2_mp_rank_04_optim_states.pt
[2021-11-01 19:22:21,471] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_1_mp_rank_04_optim_states.pt
[2021-11-01 19:22:21,471] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_0_mp_rank_08_optim_states.pt
[2021-11-01 19:22:21,484] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_0_mp_rank_07_optim_states.pt
[2021-11-01 19:22:21,487] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_2_mp_rank_11_optim_states.pt
[2021-11-01 19:22:21,488] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_3_mp_rank_10_optim_states.pt
[2021-11-01 19:22:21,489] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_1_mp_rank_09_optim_states.pt
[2021-11-01 19:22:21,490] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_1_mp_rank_08_optim_states.pt
[2021-11-01 19:22:21,491] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_3_mp_rank_11_optim_states.pt
[2021-11-01 19:22:21,493] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_0_mp_rank_10_optim_states.pt
[2021-11-01 19:22:21,494] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_1_mp_rank_07_optim_states.pt
[2021-11-01 19:22:21,494] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_0_mp_rank_11_optim_states.pt
[2021-11-01 19:22:21,494] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_2_mp_rank_10_optim_states.pt
[2021-11-01 19:22:21,495] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_0_mp_rank_06_optim_states.pt
[2021-11-01 19:22:21,495] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_1_mp_rank_06_optim_states.pt
[2021-11-01 19:22:21,496] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_2_mp_rank_06_optim_states.pt
[2021-11-01 19:22:21,498] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_2_mp_rank_07_optim_states.pt
[2021-11-01 19:22:21,509] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_3_mp_rank_05_optim_states.pt
[2021-11-01 19:22:21,511] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_3_mp_rank_04_optim_states.pt
[2021-11-01 19:22:21,584] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_2_mp_rank_03_optim_states.pt
[2021-11-01 19:22:21,587] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_0_mp_rank_01_optim_states.pt
[2021-11-01 19:22:21,593] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_3_mp_rank_01_optim_states.pt
[2021-11-01 19:22:21,595] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_3_mp_rank_00_optim_states.pt
[2021-11-01 19:22:21,596] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_1_mp_rank_00_optim_states.pt
[2021-11-01 19:22:21,596] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_1_mp_rank_15_optim_states.pt
[2021-11-01 19:22:21,599] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_2_mp_rank_01_optim_states.pt
[2021-11-01 19:22:21,599] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_0_mp_rank_15_optim_states.pt
[2021-11-01 19:22:21,601] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_2_mp_rank_15_optim_states.pt
[2021-11-01 19:22:21,603] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_3_mp_rank_13_optim_states.pt
[2021-11-01 19:22:21,603] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_1_mp_rank_14_optim_states.pt
[2021-11-01 19:22:21,607] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_1_mp_rank_02_optim_states.pt
[2021-11-01 19:22:21,607] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_0_mp_rank_12_optim_states.pt
[2021-11-01 19:22:21,611] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_3_mp_rank_15_optim_states.pt
[2021-11-01 19:22:21,615] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_2_mp_rank_12_optim_states.pt
[2021-11-01 19:22:21,627] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_3_mp_rank_03_optim_states.pt
[2021-11-01 19:22:21,629] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_0_mp_rank_02_optim_states.pt
[2021-11-01 19:22:21,631] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_3_mp_rank_02_optim_states.pt
[2021-11-01 19:22:21,633] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_0_mp_rank_13_optim_states.pt
[2021-11-01 19:22:21,634] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_3_mp_rank_14_optim_states.pt
[2021-11-01 19:22:21,638] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_2_mp_rank_00_optim_states.pt
[2021-11-01 19:22:21,642] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_0_mp_rank_03_optim_states.pt
[2021-11-01 19:22:21,642] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_1_mp_rank_13_optim_states.pt
[2021-11-01 19:22:21,643] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_0_mp_rank_14_optim_states.pt
[2021-11-01 19:22:21,644] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_2_mp_rank_02_optim_states.pt
[2021-11-01 19:22:21,646] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_2_mp_rank_13_optim_states.pt
[2021-11-01 19:22:21,646] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_2_mp_rank_14_optim_states.pt
[2021-11-01 19:22:21,647] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_3_mp_rank_12_optim_states.pt
[2021-11-01 19:22:21,650] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_1_mp_rank_12_optim_states.pt
[2021-11-01 19:22:21,650] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_1_mp_rank_01_optim_states.pt
[2021-11-01 19:22:21,655] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_1_mp_rank_03_optim_states.pt
[2021-11-01 19:22:21,663] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_0_mp_rank_00_optim_states.pt
  successfully saved checkpoint at iteration    4500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
time (ms) | save-checkpoint: 1112.04
 iteration     4600/  152972 | consumed samples:       161056 | consumed tokens:    329842688 | elapsed time per iteration (ms): 1616.3 | learning rate: 1.756E-04 | global batch size:    64 | lm loss: 3.361848E+00 | loss scale: 32768.0 | grad norm: 34305.181 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration     4800/  152972 | consumed samples:       173856 | consumed tokens:    356057088 | elapsed time per iteration (ms): 1607.5 | learning rate: 1.895E-04 | global batch size:    64 | lm loss: 3.095694E+00 | loss scale: 32768.0 | grad norm: 8177.898 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration     5000/  152972 | consumed samples:       186656 | consumed tokens:    382271488 | elapsed time per iteration (ms): 1603.2 | learning rate: 2.000E-04 | global batch size:    64 | lm loss: 3.158372E+00 | loss scale: 32768.0 | grad norm: 8572.169 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
------------------------------------------------------------------------------------------------
 validation loss at iteration 5000 | lm loss value: 3.023918E+00 | lm loss PPL: 2.057174E+01 | 
------------------------------------------------------------------------------------------------
 iteration     5200/  152972 | consumed samples:       199456 | consumed tokens:    408485888 | elapsed time per iteration (ms): 1830.6 | learning rate: 2.000E-04 | global batch size:    64 | lm loss: 3.131141E+00 | loss scale: 65536.0 | grad norm: 17721.323 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration     5400/  152972 | consumed samples:       212256 | consumed tokens:    434700288 | elapsed time per iteration (ms): 1602.8 | learning rate: 2.000E-04 | global batch size:    64 | lm loss: 3.115537E+00 | loss scale: 65536.0 | grad norm: 14115.718 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration     5600/  152972 | consumed samples:       225056 | consumed tokens:    460914688 | elapsed time per iteration (ms): 1599.5 | learning rate: 2.000E-04 | global batch size:    64 | lm loss: 2.995638E+00 | loss scale: 131072.0 | grad norm: 41343.656 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration     5800/  152972 | consumed samples:       237856 | consumed tokens:    487129088 | elapsed time per iteration (ms): 1594.3 | learning rate: 2.000E-04 | global batch size:    64 | lm loss: 3.052323E+00 | loss scale: 131072.0 | grad norm: 159959.261 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
[2021-11-01 20:03:09,034] [INFO] [logging.py:68:log_dist] [Rank 0] step=6000, skipped=9, lr=[0.00019999960451637578, 0.00019999960451637578], mom=[(0.9, 0.999), (0.9, 0.999)]
 iteration     6000/  152972 | consumed samples:       250656 | consumed tokens:    513343488 | elapsed time per iteration (ms): 1599.6 | learning rate: 2.000E-04 | global batch size:    64 | lm loss: 3.011719E+00 | loss scale: 65536.0 | grad norm: 15847.576 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
steps: 6000 loss: 2.7031 iter time (s): 0.001 samples/sec: 81062.888
------------------------------------------------------------------------------------------------
 validation loss at iteration 6000 | lm loss value: 2.982520E+00 | lm loss PPL: 1.973749E+01 | 
------------------------------------------------------------------------------------------------
saving checkpoint at iteration    6000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
[2021-11-01 20:03:54,444] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/mp_rank_00_model_states.pt
[2021-11-01 20:03:54,452] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/mp_rank_01_model_states.pt
[2021-11-01 20:03:54,827] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_0_mp_rank_06_optim_states.pt
[2021-11-01 20:03:54,831] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_2_mp_rank_05_optim_states.pt
[2021-11-01 20:03:54,833] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_0_mp_rank_04_optim_states.pt
[2021-11-01 20:03:54,837] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_1_mp_rank_11_optim_states.pt
[2021-11-01 20:03:54,839] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_2_mp_rank_04_optim_states.pt
[2021-11-01 20:03:54,839] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_3_mp_rank_08_optim_states.pt
[2021-11-01 20:03:54,839] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_2_mp_rank_08_optim_states.pt
[2021-11-01 20:03:54,840] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_3_mp_rank_06_optim_states.pt
[2021-11-01 20:03:54,841] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_1_mp_rank_05_optim_states.pt
[2021-11-01 20:03:54,842] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_3_mp_rank_07_optim_states.pt
[2021-11-01 20:03:54,843] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_2_mp_rank_09_optim_states.pt
[2021-11-01 20:03:54,843] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_1_mp_rank_04_optim_states.pt
[2021-11-01 20:03:54,844] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_1_mp_rank_10_optim_states.pt
[2021-11-01 20:03:54,844] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_0_mp_rank_09_optim_states.pt
[2021-11-01 20:03:54,845] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_0_mp_rank_11_optim_states.pt
[2021-11-01 20:03:54,849] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_3_mp_rank_09_optim_states.pt
[2021-11-01 20:03:54,853] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_0_mp_rank_08_optim_states.pt
[2021-11-01 20:03:54,860] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_3_mp_rank_05_optim_states.pt
[2021-11-01 20:03:54,861] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_1_mp_rank_09_optim_states.pt
[2021-11-01 20:03:54,863] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_3_mp_rank_10_optim_states.pt
[2021-11-01 20:03:54,864] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_3_mp_rank_04_optim_states.pt
[2021-11-01 20:03:54,864] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_3_mp_rank_11_optim_states.pt
[2021-11-01 20:03:54,864] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_0_mp_rank_07_optim_states.pt
[2021-11-01 20:03:54,866] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_1_mp_rank_08_optim_states.pt
[2021-11-01 20:03:54,867] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_0_mp_rank_05_optim_states.pt
[2021-11-01 20:03:54,870] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_2_mp_rank_07_optim_states.pt
[2021-11-01 20:03:54,870] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_0_mp_rank_10_optim_states.pt
[2021-11-01 20:03:54,873] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_2_mp_rank_06_optim_states.pt
[2021-11-01 20:03:54,873] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_1_mp_rank_07_optim_states.pt
[2021-11-01 20:03:54,875] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_1_mp_rank_06_optim_states.pt
[2021-11-01 20:03:54,875] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_2_mp_rank_10_optim_states.pt
[2021-11-01 20:03:54,875] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_2_mp_rank_11_optim_states.pt
[2021-11-01 20:03:54,957] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_2_mp_rank_02_optim_states.pt
[2021-11-01 20:03:54,970] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_1_mp_rank_12_optim_states.pt
[2021-11-01 20:03:54,971] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_3_mp_rank_02_optim_states.pt
[2021-11-01 20:03:54,972] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_1_mp_rank_14_optim_states.pt
[2021-11-01 20:03:54,972] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_0_mp_rank_01_optim_states.pt
[2021-11-01 20:03:54,972] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_0_mp_rank_00_optim_states.pt
[2021-11-01 20:03:54,973] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_2_mp_rank_15_optim_states.pt
[2021-11-01 20:03:54,975] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_0_mp_rank_14_optim_states.pt
[2021-11-01 20:03:54,977] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_3_mp_rank_13_optim_states.pt
[2021-11-01 20:03:54,978] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_0_mp_rank_15_optim_states.pt
[2021-11-01 20:03:54,979] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_3_mp_rank_00_optim_states.pt
[2021-11-01 20:03:54,979] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_0_mp_rank_02_optim_states.pt
[2021-11-01 20:03:54,981] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_1_mp_rank_01_optim_states.pt
[2021-11-01 20:03:54,981] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_3_mp_rank_14_optim_states.pt
[2021-11-01 20:03:54,983] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_2_mp_rank_12_optim_states.pt
[2021-11-01 20:03:54,989] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_1_mp_rank_03_optim_states.pt
[2021-11-01 20:03:55,002] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_3_mp_rank_03_optim_states.pt
[2021-11-01 20:03:55,002] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_2_mp_rank_01_optim_states.pt
[2021-11-01 20:03:55,004] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_0_mp_rank_03_optim_states.pt
[2021-11-01 20:03:55,006] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_1_mp_rank_15_optim_states.pt
[2021-11-01 20:03:55,006] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_1_mp_rank_02_optim_states.pt
[2021-11-01 20:03:55,007] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_3_mp_rank_15_optim_states.pt
[2021-11-01 20:03:55,008] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_1_mp_rank_00_optim_states.pt
[2021-11-01 20:03:55,009] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_0_mp_rank_12_optim_states.pt
[2021-11-01 20:03:55,011] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_2_mp_rank_03_optim_states.pt
[2021-11-01 20:03:55,011] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_2_mp_rank_13_optim_states.pt
[2021-11-01 20:03:55,011] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_2_mp_rank_00_optim_states.pt
[2021-11-01 20:03:55,012] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_3_mp_rank_01_optim_states.pt
[2021-11-01 20:03:55,013] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_0_mp_rank_13_optim_states.pt
[2021-11-01 20:03:55,014] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_2_mp_rank_14_optim_states.pt
[2021-11-01 20:03:55,015] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_1_mp_rank_13_optim_states.pt
[2021-11-01 20:03:55,021] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_3_mp_rank_12_optim_states.pt
  successfully saved checkpoint at iteration    6000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
time (ms) | save-checkpoint: 1141.83
 iteration     6200/  152972 | consumed samples:       263456 | consumed tokens:    539557888 | elapsed time per iteration (ms): 1853.8 | learning rate: 2.000E-04 | global batch size:    64 | lm loss: 2.901359E+00 | loss scale: 65536.0 | grad norm: 15147.621 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration     6400/  152972 | consumed samples:       281024 | consumed tokens:    575537152 | elapsed time per iteration (ms): 5426.9 | learning rate: 2.000E-04 | global batch size:    96 | lm loss: 2.894193E+00 | loss scale: 131072.0 | grad norm: 32218.134 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration     6600/  152972 | consumed samples:       300224 | consumed tokens:    614858752 | elapsed time per iteration (ms): 1904.6 | learning rate: 2.000E-04 | global batch size:    96 | lm loss: 2.819356E+00 | loss scale: 131072.0 | grad norm: 20253.250 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration     6800/  152972 | consumed samples:       319424 | consumed tokens:    654180352 | elapsed time per iteration (ms): 1902.2 | learning rate: 2.000E-04 | global batch size:    96 | lm loss: 2.835417E+00 | loss scale: 131072.0 | grad norm: 25013.229 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration     7000/  152972 | consumed samples:       338624 | consumed tokens:    693501952 | elapsed time per iteration (ms): 1902.6 | learning rate: 2.000E-04 | global batch size:    96 | lm loss: 2.845205E+00 | loss scale: 262144.0 | grad norm: 54172.364 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
------------------------------------------------------------------------------------------------
 validation loss at iteration 7000 | lm loss value: 2.723926E+00 | lm loss PPL: 1.524004E+01 | 
------------------------------------------------------------------------------------------------
 iteration     7200/  152972 | consumed samples:       357824 | consumed tokens:    732823552 | elapsed time per iteration (ms): 2188.8 | learning rate: 2.000E-04 | global batch size:    96 | lm loss: 2.742239E+00 | loss scale: 262144.0 | grad norm: 42485.311 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration     7400/  152972 | consumed samples:       377024 | consumed tokens:    772145152 | elapsed time per iteration (ms): 1899.3 | learning rate: 2.000E-04 | global batch size:    96 | lm loss: 2.788103E+00 | loss scale: 524288.0 | grad norm: 88162.119 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
saving checkpoint at iteration    7500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
[2021-11-01 21:03:14,781] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/mp_rank_00_model_states.pt
[2021-11-01 21:03:14,812] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/mp_rank_01_model_states.pt
[2021-11-01 21:03:15,205] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_2_mp_rank_08_optim_states.pt
[2021-11-01 21:03:15,208] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_2_mp_rank_05_optim_states.pt
[2021-11-01 21:03:15,209] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_0_mp_rank_04_optim_states.pt
[2021-11-01 21:03:15,210] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_1_mp_rank_11_optim_states.pt
[2021-11-01 21:03:15,212] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_3_mp_rank_07_optim_states.pt
[2021-11-01 21:03:15,212] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_0_mp_rank_09_optim_states.pt
[2021-11-01 21:03:15,212] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_3_mp_rank_09_optim_states.pt
[2021-11-01 21:03:15,212] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_0_mp_rank_07_optim_states.pt
[2021-11-01 21:03:15,213] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_2_mp_rank_04_optim_states.pt
[2021-11-01 21:03:15,215] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_2_mp_rank_09_optim_states.pt
[2021-11-01 21:03:15,215] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_1_mp_rank_06_optim_states.pt
[2021-11-01 21:03:15,215] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_1_mp_rank_05_optim_states.pt
[2021-11-01 21:03:15,217] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_1_mp_rank_04_optim_states.pt
[2021-11-01 21:03:15,218] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_3_mp_rank_10_optim_states.pt
[2021-11-01 21:03:15,218] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_3_mp_rank_04_optim_states.pt
[2021-11-01 21:03:15,220] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_1_mp_rank_08_optim_states.pt
[2021-11-01 21:03:15,223] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_0_mp_rank_08_optim_states.pt
[2021-11-01 21:03:15,234] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_0_mp_rank_06_optim_states.pt
[2021-11-01 21:03:15,234] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_3_mp_rank_05_optim_states.pt
[2021-11-01 21:03:15,237] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_2_mp_rank_07_optim_states.pt
[2021-11-01 21:03:15,238] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_0_mp_rank_11_optim_states.pt
[2021-11-01 21:03:15,240] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_3_mp_rank_11_optim_states.pt
[2021-11-01 21:03:15,240] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_2_mp_rank_10_optim_states.pt
[2021-11-01 21:03:15,240] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_2_mp_rank_11_optim_states.pt
[2021-11-01 21:03:15,241] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_2_mp_rank_06_optim_states.pt
[2021-11-01 21:03:15,241] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_0_mp_rank_10_optim_states.pt
[2021-11-01 21:03:15,242] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_1_mp_rank_10_optim_states.pt
[2021-11-01 21:03:15,243] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_3_mp_rank_06_optim_states.pt
[2021-11-01 21:03:15,243] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_1_mp_rank_07_optim_states.pt
[2021-11-01 21:03:15,244] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_3_mp_rank_08_optim_states.pt
[2021-11-01 21:03:15,244] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_1_mp_rank_09_optim_states.pt
[2021-11-01 21:03:15,248] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_0_mp_rank_05_optim_states.pt
[2021-11-01 21:03:15,340] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_1_mp_rank_12_optim_states.pt
[2021-11-01 21:03:15,340] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_0_mp_rank_14_optim_states.pt
[2021-11-01 21:03:15,343] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_1_mp_rank_03_optim_states.pt
[2021-11-01 21:03:15,345] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_0_mp_rank_00_optim_states.pt
[2021-11-01 21:03:15,346] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_1_mp_rank_14_optim_states.pt
[2021-11-01 21:03:15,348] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_2_mp_rank_01_optim_states.pt
[2021-11-01 21:03:15,349] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_3_mp_rank_13_optim_states.pt
[2021-11-01 21:03:15,350] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_3_mp_rank_15_optim_states.pt
[2021-11-01 21:03:15,354] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_1_mp_rank_00_optim_states.pt
[2021-11-01 21:03:15,358] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_3_mp_rank_14_optim_states.pt
[2021-11-01 21:03:15,359] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_2_mp_rank_00_optim_states.pt
[2021-11-01 21:03:15,365] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_2_mp_rank_13_optim_states.pt
[2021-11-01 21:03:15,371] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_0_mp_rank_15_optim_states.pt
[2021-11-01 21:03:15,373] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_0_mp_rank_02_optim_states.pt
[2021-11-01 21:03:15,377] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_2_mp_rank_03_optim_states.pt
[2021-11-01 21:03:15,378] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_1_mp_rank_01_optim_states.pt
[2021-11-01 21:03:15,380] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_3_mp_rank_01_optim_states.pt
[2021-11-01 21:03:15,383] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_1_mp_rank_15_optim_states.pt
[2021-11-01 21:03:15,383] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_0_mp_rank_13_optim_states.pt
[2021-11-01 21:03:15,385] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_0_mp_rank_12_optim_states.pt
[2021-11-01 21:03:15,385] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_1_mp_rank_13_optim_states.pt
[2021-11-01 21:03:15,386] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_0_mp_rank_01_optim_states.pt
[2021-11-01 21:03:15,386] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_1_mp_rank_02_optim_states.pt
[2021-11-01 21:03:15,387] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_3_mp_rank_12_optim_states.pt
[2021-11-01 21:03:15,389] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_2_mp_rank_02_optim_states.pt
[2021-11-01 21:03:15,390] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_0_mp_rank_03_optim_states.pt
[2021-11-01 21:03:15,390] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_3_mp_rank_03_optim_states.pt
[2021-11-01 21:03:15,396] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_2_mp_rank_15_optim_states.pt
[2021-11-01 21:03:15,397] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_2_mp_rank_12_optim_states.pt
[2021-11-01 21:03:15,401] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_2_mp_rank_14_optim_states.pt
[2021-11-01 21:03:15,420] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_3_mp_rank_02_optim_states.pt
[2021-11-01 21:03:15,458] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_3_mp_rank_00_optim_states.pt
  successfully saved checkpoint at iteration    7500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
time (ms) | save-checkpoint: 1102.90
 iteration     7600/  152972 | consumed samples:       396224 | consumed tokens:    811466752 | elapsed time per iteration (ms): 1902.5 | learning rate: 2.000E-04 | global batch size:    96 | lm loss: 2.754191E+00 | loss scale: 131072.0 | grad norm: 24945.175 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration     7800/  152972 | consumed samples:       420544 | consumed tokens:    861274112 | elapsed time per iteration (ms): 2150.3 | learning rate: 2.000E-04 | global batch size:   128 | lm loss: 2.751569E+00 | loss scale: 131072.0 | grad norm: 21880.873 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
[2021-11-01 21:20:57,637] [INFO] [logging.py:68:log_dist] [Rank 0] step=8000, skipped=12, lr=[0.0001999939570800071, 0.0001999939570800071], mom=[(0.9, 0.999), (0.9, 0.999)]
 iteration     8000/  152972 | consumed samples:       446144 | consumed tokens:    913702912 | elapsed time per iteration (ms): 2212.0 | learning rate: 2.000E-04 | global batch size:   128 | lm loss: 2.788481E+00 | loss scale: 131072.0 | grad norm: 22323.475 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
steps: 8000 loss: 2.8012 iter time (s): 0.001 samples/sec: 115833.903
------------------------------------------------------------------------------------------------
 validation loss at iteration 8000 | lm loss value: 2.689925E+00 | lm loss PPL: 1.473057E+01 | 
------------------------------------------------------------------------------------------------
 iteration     8200/  152972 | consumed samples:       471744 | consumed tokens:    966131712 | elapsed time per iteration (ms): 2546.5 | learning rate: 2.000E-04 | global batch size:   128 | lm loss: 2.704028E+00 | loss scale: 262144.0 | grad norm: 48355.145 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration     8400/  152972 | consumed samples:       497344 | consumed tokens:   1018560512 | elapsed time per iteration (ms): 2205.6 | learning rate: 2.000E-04 | global batch size:   128 | lm loss: 2.684679E+00 | loss scale: 131072.0 | grad norm: 22705.576 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration     8600/  152972 | consumed samples:       522944 | consumed tokens:   1070989312 | elapsed time per iteration (ms): 2206.3 | learning rate: 2.000E-04 | global batch size:   128 | lm loss: 2.687691E+00 | loss scale: 65536.0 | grad norm: 9859.402 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration     8800/  152972 | consumed samples:       552320 | consumed tokens:   1131151360 | elapsed time per iteration (ms): 2387.2 | learning rate: 2.000E-04 | global batch size:   160 | lm loss: 2.648720E+00 | loss scale: 65536.0 | grad norm: 10153.437 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration     9000/  152972 | consumed samples:       584320 | consumed tokens:   1196687360 | elapsed time per iteration (ms): 2527.2 | learning rate: 2.000E-04 | global batch size:   160 | lm loss: 2.595457E+00 | loss scale: 65536.0 | grad norm: 8486.364 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
------------------------------------------------------------------------------------------------
 validation loss at iteration 9000 | lm loss value: 2.544508E+00 | lm loss PPL: 1.273696E+01 | 
------------------------------------------------------------------------------------------------
saving checkpoint at iteration    9000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
[2021-11-01 22:01:50,427] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/mp_rank_00_model_states.pt
[2021-11-01 22:01:50,465] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/mp_rank_01_model_states.pt
[2021-11-01 22:01:50,823] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_2_mp_rank_05_optim_states.pt
[2021-11-01 22:01:50,824] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_0_mp_rank_04_optim_states.pt
[2021-11-01 22:01:50,826] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_1_mp_rank_11_optim_states.pt
[2021-11-01 22:01:50,827] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_2_mp_rank_04_optim_states.pt
[2021-11-01 22:01:50,827] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_2_mp_rank_08_optim_states.pt
[2021-11-01 22:01:50,828] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_2_mp_rank_09_optim_states.pt
[2021-11-01 22:01:50,831] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_0_mp_rank_07_optim_states.pt
[2021-11-01 22:01:50,832] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_1_mp_rank_10_optim_states.pt
[2021-11-01 22:01:50,832] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_1_mp_rank_05_optim_states.pt
[2021-11-01 22:01:50,834] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_1_mp_rank_04_optim_states.pt
[2021-11-01 22:01:50,836] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_0_mp_rank_09_optim_states.pt
[2021-11-01 22:01:50,838] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_3_mp_rank_09_optim_states.pt
[2021-11-01 22:01:50,841] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_3_mp_rank_10_optim_states.pt
[2021-11-01 22:01:50,843] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_3_mp_rank_04_optim_states.pt
[2021-11-01 22:01:50,844] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_0_mp_rank_08_optim_states.pt
[2021-11-01 22:01:50,847] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_3_mp_rank_07_optim_states.pt
[2021-11-01 22:01:50,850] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_0_mp_rank_06_optim_states.pt
[2021-11-01 22:01:50,851] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_2_mp_rank_06_optim_states.pt
[2021-11-01 22:01:50,852] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_1_mp_rank_09_optim_states.pt
[2021-11-01 22:01:50,856] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_2_mp_rank_11_optim_states.pt
[2021-11-01 22:01:50,856] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_2_mp_rank_07_optim_states.pt
[2021-11-01 22:01:50,857] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_0_mp_rank_05_optim_states.pt
[2021-11-01 22:01:50,857] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_1_mp_rank_08_optim_states.pt
[2021-11-01 22:01:50,858] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_0_mp_rank_10_optim_states.pt
[2021-11-01 22:01:50,858] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_0_mp_rank_11_optim_states.pt
[2021-11-01 22:01:50,859] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_2_mp_rank_10_optim_states.pt
[2021-11-01 22:01:50,860] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_1_mp_rank_06_optim_states.pt
[2021-11-01 22:01:50,863] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_3_mp_rank_11_optim_states.pt
[2021-11-01 22:01:50,867] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_3_mp_rank_05_optim_states.pt
[2021-11-01 22:01:50,868] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_3_mp_rank_08_optim_states.pt
[2021-11-01 22:01:50,872] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_1_mp_rank_07_optim_states.pt
[2021-11-01 22:01:50,873] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_3_mp_rank_06_optim_states.pt
[2021-11-01 22:01:50,954] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_3_mp_rank_01_optim_states.pt
[2021-11-01 22:01:50,955] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_1_mp_rank_12_optim_states.pt
[2021-11-01 22:01:50,956] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_2_mp_rank_03_optim_states.pt
[2021-11-01 22:01:50,956] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_0_mp_rank_00_optim_states.pt
[2021-11-01 22:01:50,958] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_3_mp_rank_12_optim_states.pt
[2021-11-01 22:01:50,959] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_0_mp_rank_14_optim_states.pt
[2021-11-01 22:01:50,963] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_1_mp_rank_03_optim_states.pt
[2021-11-01 22:01:50,965] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_3_mp_rank_15_optim_states.pt
[2021-11-01 22:01:50,966] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_3_mp_rank_03_optim_states.pt
[2021-11-01 22:01:50,968] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_1_mp_rank_15_optim_states.pt
[2021-11-01 22:01:50,968] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_0_mp_rank_02_optim_states.pt
[2021-11-01 22:01:50,972] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_2_mp_rank_12_optim_states.pt
[2021-11-01 22:01:50,973] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_1_mp_rank_00_optim_states.pt
[2021-11-01 22:01:50,973] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_2_mp_rank_00_optim_states.pt
[2021-11-01 22:01:50,975] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_0_mp_rank_12_optim_states.pt
[2021-11-01 22:01:50,977] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_1_mp_rank_14_optim_states.pt
[2021-11-01 22:01:50,983] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_2_mp_rank_13_optim_states.pt
[2021-11-01 22:01:50,983] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_0_mp_rank_03_optim_states.pt
[2021-11-01 22:01:50,990] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_1_mp_rank_01_optim_states.pt
[2021-11-01 22:01:50,992] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_3_mp_rank_14_optim_states.pt
[2021-11-01 22:01:50,996] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_0_mp_rank_01_optim_states.pt
[2021-11-01 22:01:50,996] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_0_mp_rank_13_optim_states.pt
[2021-11-01 22:01:51,000] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_3_mp_rank_02_optim_states.pt
[2021-11-01 22:01:51,000] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_0_mp_rank_15_optim_states.pt
[2021-11-01 22:01:51,002] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_3_mp_rank_13_optim_states.pt
[2021-11-01 22:01:51,002] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_2_mp_rank_14_optim_states.pt
[2021-11-01 22:01:51,003] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_2_mp_rank_01_optim_states.pt
[2021-11-01 22:01:51,003] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_2_mp_rank_02_optim_states.pt
[2021-11-01 22:01:51,005] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_1_mp_rank_13_optim_states.pt
[2021-11-01 22:01:51,006] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_3_mp_rank_00_optim_states.pt
[2021-11-01 22:01:51,007] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_1_mp_rank_02_optim_states.pt
[2021-11-01 22:01:51,013] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_2_mp_rank_15_optim_states.pt
  successfully saved checkpoint at iteration    9000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
time (ms) | save-checkpoint: 1087.59
 iteration     9200/  152972 | consumed samples:       616320 | consumed tokens:   1262223360 | elapsed time per iteration (ms): 2939.1 | learning rate: 2.000E-04 | global batch size:   160 | lm loss: 2.584026E+00 | loss scale: 131072.0 | grad norm: 18164.939 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration     9400/  152972 | consumed samples:       648320 | consumed tokens:   1327759360 | elapsed time per iteration (ms): 2554.0 | learning rate: 2.000E-04 | global batch size:   160 | lm loss: 2.627375E+00 | loss scale: 131072.0 | grad norm: 18730.400 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration     9600/  152972 | consumed samples:       683040 | consumed tokens:   1398865920 | elapsed time per iteration (ms): 2682.6 | learning rate: 2.000E-04 | global batch size:   192 | lm loss: 2.597666E+00 | loss scale: 262144.0 | grad norm: 32685.020 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration     9800/  152972 | consumed samples:       721440 | consumed tokens:   1477509120 | elapsed time per iteration (ms): 2857.6 | learning rate: 2.000E-04 | global batch size:   192 | lm loss: 2.547980E+00 | loss scale: 262144.0 | grad norm: 33598.604 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
[2021-11-01 22:46:50,240] [INFO] [logging.py:68:log_dist] [Rank 0] step=10000, skipped=16, lr=[0.00019997091981206023, 0.00019997091981206023], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 10000 loss: 2.6857 iter time (s): 0.001 samples/sec: 135371.169
 iteration    10000/  152972 | consumed samples:       759840 | consumed tokens:   1556152320 | elapsed time per iteration (ms): 2856.9 | learning rate: 2.000E-04 | global batch size:   192 | lm loss: 2.548814E+00 | loss scale: 262144.0 | grad norm: 33355.704 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
-------------------------------------------------------------------------------------------------
 validation loss at iteration 10000 | lm loss value: 2.537272E+00 | lm loss PPL: 1.264513E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    10200/  152972 | consumed samples:       798240 | consumed tokens:   1634795520 | elapsed time per iteration (ms): 3293.9 | learning rate: 2.000E-04 | global batch size:   192 | lm loss: 2.511054E+00 | loss scale: 524288.0 | grad norm: 72033.443 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    10400/  152972 | consumed samples:       842720 | consumed tokens:   1725890560 | elapsed time per iteration (ms): 3153.9 | learning rate: 2.000E-04 | global batch size:   224 | lm loss: 2.503509E+00 | loss scale: 262144.0 | grad norm: 32736.633 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
saving checkpoint at iteration   10500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
[2021-11-01 23:13:37,746] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/mp_rank_01_model_states.pt
[2021-11-01 23:13:37,769] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/mp_rank_00_model_states.pt
[2021-11-01 23:13:38,134] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_0_mp_rank_04_optim_states.pt
[2021-11-01 23:13:38,135] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_2_mp_rank_09_optim_states.pt
[2021-11-01 23:13:38,135] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_2_mp_rank_06_optim_states.pt
[2021-11-01 23:13:38,136] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_2_mp_rank_05_optim_states.pt
[2021-11-01 23:13:38,136] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_0_mp_rank_07_optim_states.pt
[2021-11-01 23:13:38,136] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_2_mp_rank_08_optim_states.pt
[2021-11-01 23:13:38,137] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_1_mp_rank_11_optim_states.pt
[2021-11-01 23:13:38,138] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_3_mp_rank_10_optim_states.pt
[2021-11-01 23:13:38,139] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_3_mp_rank_09_optim_states.pt
[2021-11-01 23:13:38,141] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_1_mp_rank_10_optim_states.pt
[2021-11-01 23:13:38,142] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_1_mp_rank_05_optim_states.pt
[2021-11-01 23:13:38,142] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_3_mp_rank_06_optim_states.pt
[2021-11-01 23:13:38,143] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_0_mp_rank_08_optim_states.pt
[2021-11-01 23:13:38,143] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_1_mp_rank_04_optim_states.pt
[2021-11-01 23:13:38,144] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_0_mp_rank_09_optim_states.pt
[2021-11-01 23:13:38,145] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_3_mp_rank_07_optim_states.pt
[2021-11-01 23:13:38,150] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_3_mp_rank_04_optim_states.pt
[2021-11-01 23:13:38,155] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_0_mp_rank_06_optim_states.pt
[2021-11-01 23:13:38,156] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_3_mp_rank_05_optim_states.pt
[2021-11-01 23:13:38,156] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_2_mp_rank_10_optim_states.pt
[2021-11-01 23:13:38,160] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_0_mp_rank_05_optim_states.pt
[2021-11-01 23:13:38,161] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_2_mp_rank_11_optim_states.pt
[2021-11-01 23:13:38,162] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_2_mp_rank_07_optim_states.pt
[2021-11-01 23:13:38,164] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_1_mp_rank_09_optim_states.pt
[2021-11-01 23:13:38,164] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_3_mp_rank_08_optim_states.pt
[2021-11-01 23:13:38,164] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_1_mp_rank_08_optim_states.pt
[2021-11-01 23:13:38,164] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_1_mp_rank_07_optim_states.pt
[2021-11-01 23:13:38,165] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_0_mp_rank_11_optim_states.pt
[2021-11-01 23:13:38,166] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_3_mp_rank_11_optim_states.pt
[2021-11-01 23:13:38,168] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_2_mp_rank_04_optim_states.pt
[2021-11-01 23:13:38,169] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_0_mp_rank_10_optim_states.pt
[2021-11-01 23:13:38,172] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_1_mp_rank_06_optim_states.pt
[2021-11-01 23:13:38,263] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_1_mp_rank_02_optim_states.pt
[2021-11-01 23:13:38,266] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_2_mp_rank_02_optim_states.pt
[2021-11-01 23:13:38,266] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_3_mp_rank_01_optim_states.pt
[2021-11-01 23:13:38,266] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_2_mp_rank_03_optim_states.pt
[2021-11-01 23:13:38,267] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_0_mp_rank_15_optim_states.pt
[2021-11-01 23:13:38,268] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_3_mp_rank_13_optim_states.pt
[2021-11-01 23:13:38,270] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_3_mp_rank_02_optim_states.pt
[2021-11-01 23:13:38,271] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_0_mp_rank_01_optim_states.pt
[2021-11-01 23:13:38,271] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_0_mp_rank_03_optim_states.pt
[2021-11-01 23:13:38,271] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_2_mp_rank_14_optim_states.pt
[2021-11-01 23:13:38,272] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_1_mp_rank_13_optim_states.pt
[2021-11-01 23:13:38,275] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_0_mp_rank_02_optim_states.pt
[2021-11-01 23:13:38,276] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_3_mp_rank_14_optim_states.pt
[2021-11-01 23:13:38,276] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_1_mp_rank_00_optim_states.pt
[2021-11-01 23:13:38,278] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_0_mp_rank_12_optim_states.pt
[2021-11-01 23:13:38,296] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_1_mp_rank_14_optim_states.pt
[2021-11-01 23:13:38,298] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_3_mp_rank_15_optim_states.pt
[2021-11-01 23:13:38,302] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_3_mp_rank_03_optim_states.pt
[2021-11-01 23:13:38,302] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_1_mp_rank_15_optim_states.pt
[2021-11-01 23:13:38,303] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_0_mp_rank_13_optim_states.pt
[2021-11-01 23:13:38,303] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_3_mp_rank_12_optim_states.pt
[2021-11-01 23:13:38,305] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_2_mp_rank_00_optim_states.pt
[2021-11-01 23:13:38,305] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_2_mp_rank_01_optim_states.pt
[2021-11-01 23:13:38,306] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_1_mp_rank_03_optim_states.pt
[2021-11-01 23:13:38,306] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_0_mp_rank_14_optim_states.pt
[2021-11-01 23:13:38,308] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_3_mp_rank_00_optim_states.pt
[2021-11-01 23:13:38,308] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_2_mp_rank_13_optim_states.pt
[2021-11-01 23:13:38,310] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_1_mp_rank_01_optim_states.pt
[2021-11-01 23:13:38,311] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_2_mp_rank_12_optim_states.pt
[2021-11-01 23:13:38,311] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_2_mp_rank_15_optim_states.pt
[2021-11-01 23:13:38,313] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_1_mp_rank_12_optim_states.pt
[2021-11-01 23:13:38,315] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_0_mp_rank_00_optim_states.pt
  successfully saved checkpoint at iteration   10500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
time (ms) | save-checkpoint: 1151.58
 iteration    10600/  152972 | consumed samples:       887520 | consumed tokens:   1817640960 | elapsed time per iteration (ms): 3175.6 | learning rate: 2.000E-04 | global batch size:   224 | lm loss: 2.470897E+00 | loss scale: 262144.0 | grad norm: 29970.200 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    10800/  152972 | consumed samples:       932320 | consumed tokens:   1909391360 | elapsed time per iteration (ms): 3164.1 | learning rate: 2.000E-04 | global batch size:   224 | lm loss: 2.467487E+00 | loss scale: 131072.0 | grad norm: 16202.335 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    11000/  152972 | consumed samples:       983360 | consumed tokens:   2013921280 | elapsed time per iteration (ms): 3472.6 | learning rate: 1.999E-04 | global batch size:   256 | lm loss: 2.705436E+00 | loss scale: 131072.0 | grad norm: 23590.869 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
-------------------------------------------------------------------------------------------------
 validation loss at iteration 11000 | lm loss value: 2.527613E+00 | lm loss PPL: 1.252358E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    11200/  152972 | consumed samples:      1034560 | consumed tokens:   2118778880 | elapsed time per iteration (ms): 4020.5 | learning rate: 1.999E-04 | global batch size:   256 | lm loss: 2.485469E+00 | loss scale: 262144.0 | grad norm: 29278.673 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    11400/  152972 | consumed samples:      1088128 | consumed tokens:   2228486144 | elapsed time per iteration (ms): 3587.0 | learning rate: 1.999E-04 | global batch size:   288 | lm loss: 2.469676E+00 | loss scale: 262144.0 | grad norm: 30222.671 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    11600/  152972 | consumed samples:      1145728 | consumed tokens:   2346450944 | elapsed time per iteration (ms): 3785.3 | learning rate: 1.999E-04 | global batch size:   288 | lm loss: 2.433655E+00 | loss scale: 262144.0 | grad norm: 29033.436 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    11800/  152972 | consumed samples:      1203680 | consumed tokens:   2465136640 | elapsed time per iteration (ms): 3816.4 | learning rate: 1.999E-04 | global batch size:   320 | lm loss: 2.421053E+00 | loss scale: 524288.0 | grad norm: 54629.737 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
[2021-11-02 00:45:27,945] [INFO] [logging.py:68:log_dist] [Rank 0] step=12000, skipped=19, lr=[0.00019989706888811533, 0.00019989706888811533], mom=[(0.9, 0.999), (0.9, 0.999)]
 iteration    12000/  152972 | consumed samples:      1267680 | consumed tokens:   2596208640 | elapsed time per iteration (ms): 4119.3 | learning rate: 1.999E-04 | global batch size:   320 | lm loss: 2.395784E+00 | loss scale: 524288.0 | grad norm: 56777.098 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
steps: 12000 loss: 2.6347 iter time (s): 0.002 samples/sec: 154942.719
-------------------------------------------------------------------------------------------------
 validation loss at iteration 12000 | lm loss value: 2.391201E+00 | lm loss PPL: 1.092661E+01 | 
-------------------------------------------------------------------------------------------------
saving checkpoint at iteration   12000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
[2021-11-02 00:47:41,437] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/mp_rank_01_model_states.pt
[2021-11-02 00:47:41,476] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/mp_rank_00_model_states.pt
[2021-11-02 00:47:41,837] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_1_mp_rank_07_optim_states.pt
[2021-11-02 00:47:41,837] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_0_mp_rank_04_optim_states.pt
[2021-11-02 00:47:41,840] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_1_mp_rank_10_optim_states.pt
[2021-11-02 00:47:41,840] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_3_mp_rank_05_optim_states.pt
[2021-11-02 00:47:41,842] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_2_mp_rank_07_optim_states.pt
[2021-11-02 00:47:41,843] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_3_mp_rank_09_optim_states.pt
[2021-11-02 00:47:41,843] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_2_mp_rank_05_optim_states.pt
[2021-11-02 00:47:41,843] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_2_mp_rank_09_optim_states.pt
[2021-11-02 00:47:41,844] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_0_mp_rank_05_optim_states.pt
[2021-11-02 00:47:41,844] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_3_mp_rank_10_optim_states.pt
[2021-11-02 00:47:41,849] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_0_mp_rank_09_optim_states.pt
[2021-11-02 00:47:41,849] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_3_mp_rank_07_optim_states.pt
[2021-11-02 00:47:41,850] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_2_mp_rank_10_optim_states.pt
[2021-11-02 00:47:41,850] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_1_mp_rank_04_optim_states.pt
[2021-11-02 00:47:41,852] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_1_mp_rank_08_optim_states.pt
[2021-11-02 00:47:41,856] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_0_mp_rank_08_optim_states.pt
[2021-11-02 00:47:41,862] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_3_mp_rank_04_optim_states.pt
[2021-11-02 00:47:41,865] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_3_mp_rank_11_optim_states.pt
[2021-11-02 00:47:41,866] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_3_mp_rank_06_optim_states.pt
[2021-11-02 00:47:41,867] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_1_mp_rank_05_optim_states.pt
[2021-11-02 00:47:41,869] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_0_mp_rank_06_optim_states.pt
[2021-11-02 00:47:41,870] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_2_mp_rank_11_optim_states.pt
[2021-11-02 00:47:41,871] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_2_mp_rank_08_optim_states.pt
[2021-11-02 00:47:41,871] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_2_mp_rank_06_optim_states.pt
[2021-11-02 00:47:41,875] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_0_mp_rank_07_optim_states.pt
[2021-11-02 00:47:41,875] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_0_mp_rank_11_optim_states.pt
[2021-11-02 00:47:41,876] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_3_mp_rank_08_optim_states.pt
[2021-11-02 00:47:41,876] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_0_mp_rank_10_optim_states.pt
[2021-11-02 00:47:41,878] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_1_mp_rank_09_optim_states.pt
[2021-11-02 00:47:41,878] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_1_mp_rank_06_optim_states.pt
[2021-11-02 00:47:41,879] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_2_mp_rank_04_optim_states.pt
[2021-11-02 00:47:41,885] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_1_mp_rank_11_optim_states.pt
[2021-11-02 00:47:41,966] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_0_mp_rank_01_optim_states.pt
[2021-11-02 00:47:41,972] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_3_mp_rank_03_optim_states.pt
[2021-11-02 00:47:41,973] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_2_mp_rank_03_optim_states.pt
[2021-11-02 00:47:41,973] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_0_mp_rank_15_optim_states.pt
[2021-11-02 00:47:41,974] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_1_mp_rank_12_optim_states.pt
[2021-11-02 00:47:41,974] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_3_mp_rank_12_optim_states.pt
[2021-11-02 00:47:41,976] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_1_mp_rank_02_optim_states.pt
[2021-11-02 00:47:41,977] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_2_mp_rank_14_optim_states.pt
[2021-11-02 00:47:41,977] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_3_mp_rank_13_optim_states.pt
[2021-11-02 00:47:41,978] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_3_mp_rank_02_optim_states.pt
[2021-11-02 00:47:41,980] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_2_mp_rank_02_optim_states.pt
[2021-11-02 00:47:41,982] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_0_mp_rank_14_optim_states.pt
[2021-11-02 00:47:41,983] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_1_mp_rank_14_optim_states.pt
[2021-11-02 00:47:41,990] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_1_mp_rank_00_optim_states.pt
[2021-11-02 00:47:41,991] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_2_mp_rank_13_optim_states.pt
[2021-11-02 00:47:42,004] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_0_mp_rank_03_optim_states.pt
[2021-11-02 00:47:42,006] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_2_mp_rank_01_optim_states.pt
[2021-11-02 00:47:42,006] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_1_mp_rank_01_optim_states.pt
[2021-11-02 00:47:42,008] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_1_mp_rank_15_optim_states.pt
[2021-11-02 00:47:42,008] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_0_mp_rank_13_optim_states.pt
[2021-11-02 00:47:42,014] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_0_mp_rank_00_optim_states.pt
[2021-11-02 00:47:42,015] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_0_mp_rank_02_optim_states.pt
[2021-11-02 00:47:42,016] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_0_mp_rank_12_optim_states.pt
[2021-11-02 00:47:42,017] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_3_mp_rank_01_optim_states.pt
[2021-11-02 00:47:42,018] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_3_mp_rank_00_optim_states.pt
[2021-11-02 00:47:42,018] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_2_mp_rank_00_optim_states.pt
[2021-11-02 00:47:42,018] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_3_mp_rank_14_optim_states.pt
[2021-11-02 00:47:42,019] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_1_mp_rank_03_optim_states.pt
[2021-11-02 00:47:42,020] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_3_mp_rank_15_optim_states.pt
[2021-11-02 00:47:42,020] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_1_mp_rank_13_optim_states.pt
[2021-11-02 00:47:42,020] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_2_mp_rank_12_optim_states.pt
[2021-11-02 00:47:42,026] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_2_mp_rank_15_optim_states.pt
  successfully saved checkpoint at iteration   12000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
time (ms) | save-checkpoint: 1175.72
 iteration    12200/  152972 | consumed samples:      1331680 | consumed tokens:   2727280640 | elapsed time per iteration (ms): 4799.6 | learning rate: 1.999E-04 | global batch size:   320 | lm loss: 2.389797E+00 | loss scale: 1048576.0 | grad norm: 102065.995 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    12400/  152972 | consumed samples:      1401888 | consumed tokens:   2871066624 | elapsed time per iteration (ms): 4448.0 | learning rate: 1.999E-04 | global batch size:   352 | lm loss: 2.393252E+00 | loss scale: 1048576.0 | grad norm: 104610.634 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    12600/  152972 | consumed samples:      1472768 | consumed tokens:   3016228864 | elapsed time per iteration (ms): 4474.1 | learning rate: 1.999E-04 | global batch size:   384 | lm loss: 2.360604E+00 | loss scale: 1048576.0 | grad norm: 107325.199 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    12800/  152972 | consumed samples:      1549568 | consumed tokens:   3173515264 | elapsed time per iteration (ms): 4778.3 | learning rate: 1.998E-04 | global batch size:   384 | lm loss: 2.363876E+00 | loss scale: 2097152.0 | grad norm: 213088.191 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    13000/  152972 | consumed samples:      1628544 | consumed tokens:   3335258112 | elapsed time per iteration (ms): 4880.2 | learning rate: 1.998E-04 | global batch size:   416 | lm loss: 2.356655E+00 | loss scale: 2097152.0 | grad norm: 200363.789 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
-------------------------------------------------------------------------------------------------
 validation loss at iteration 13000 | lm loss value: 2.326353E+00 | lm loss PPL: 1.024052E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    13200/  152972 | consumed samples:      1711744 | consumed tokens:   3505651712 | elapsed time per iteration (ms): 5944.0 | learning rate: 1.998E-04 | global batch size:   416 | lm loss: 2.333088E+00 | loss scale: 2097152.0 | grad norm: 187087.543 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    13400/  152972 | consumed samples:      1799680 | consumed tokens:   3685744640 | elapsed time per iteration (ms): 5317.6 | learning rate: 1.998E-04 | global batch size:   448 | lm loss: 2.322860E+00 | loss scale: 1048576.0 | grad norm: 98604.665 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
saving checkpoint at iteration   13500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
[2021-11-02 02:49:55,953] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/mp_rank_01_model_states.pt
[2021-11-02 02:49:55,987] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/mp_rank_00_model_states.pt
[2021-11-02 02:49:56,349] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_0_mp_rank_11_optim_states.pt
[2021-11-02 02:49:56,352] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_2_mp_rank_07_optim_states.pt
[2021-11-02 02:49:56,353] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_3_mp_rank_09_optim_states.pt
[2021-11-02 02:49:56,354] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_3_mp_rank_10_optim_states.pt
[2021-11-02 02:49:56,356] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_3_mp_rank_07_optim_states.pt
[2021-11-02 02:49:56,356] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_0_mp_rank_06_optim_states.pt
[2021-11-02 02:49:56,356] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_0_mp_rank_04_optim_states.pt
[2021-11-02 02:49:56,356] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_2_mp_rank_09_optim_states.pt
[2021-11-02 02:49:56,357] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_1_mp_rank_04_optim_states.pt
[2021-11-02 02:49:56,357] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_2_mp_rank_05_optim_states.pt
[2021-11-02 02:49:56,359] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_2_mp_rank_10_optim_states.pt
[2021-11-02 02:49:56,360] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_3_mp_rank_05_optim_states.pt
[2021-11-02 02:49:56,360] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_0_mp_rank_08_optim_states.pt
[2021-11-02 02:49:56,361] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_1_mp_rank_11_optim_states.pt
[2021-11-02 02:49:56,371] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_1_mp_rank_08_optim_states.pt
[2021-11-02 02:49:56,375] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_2_mp_rank_11_optim_states.pt
[2021-11-02 02:49:56,377] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_1_mp_rank_09_optim_states.pt
[2021-11-02 02:49:56,378] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_3_mp_rank_04_optim_states.pt
[2021-11-02 02:49:56,378] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_0_mp_rank_09_optim_states.pt
[2021-11-02 02:49:56,379] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_0_mp_rank_07_optim_states.pt
[2021-11-02 02:49:56,379] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_1_mp_rank_07_optim_states.pt
[2021-11-02 02:49:56,380] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_1_mp_rank_05_optim_states.pt
[2021-11-02 02:49:56,380] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_2_mp_rank_06_optim_states.pt
[2021-11-02 02:49:56,381] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_0_mp_rank_10_optim_states.pt
[2021-11-02 02:49:56,381] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_3_mp_rank_11_optim_states.pt
[2021-11-02 02:49:56,383] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_0_mp_rank_05_optim_states.pt
[2021-11-02 02:49:56,385] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_1_mp_rank_06_optim_states.pt
[2021-11-02 02:49:56,386] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_3_mp_rank_08_optim_states.pt
[2021-11-02 02:49:56,388] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_2_mp_rank_04_optim_states.pt
[2021-11-02 02:49:56,395] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_2_mp_rank_08_optim_states.pt
[2021-11-02 02:49:56,399] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_1_mp_rank_10_optim_states.pt
[2021-11-02 02:49:56,401] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_3_mp_rank_06_optim_states.pt
[2021-11-02 02:49:56,479] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_1_mp_rank_02_optim_states.pt
[2021-11-02 02:49:56,481] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_0_mp_rank_01_optim_states.pt
[2021-11-02 02:49:56,482] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_3_mp_rank_12_optim_states.pt
[2021-11-02 02:49:56,484] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_0_mp_rank_15_optim_states.pt
[2021-11-02 02:49:56,488] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_0_mp_rank_14_optim_states.pt
[2021-11-02 02:49:56,488] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_1_mp_rank_12_optim_states.pt
[2021-11-02 02:49:56,489] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_3_mp_rank_02_optim_states.pt
[2021-11-02 02:49:56,489] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_3_mp_rank_03_optim_states.pt
[2021-11-02 02:49:56,490] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_1_mp_rank_14_optim_states.pt
[2021-11-02 02:49:56,491] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_0_mp_rank_03_optim_states.pt
[2021-11-02 02:49:56,491] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_3_mp_rank_15_optim_states.pt
[2021-11-02 02:49:56,495] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_2_mp_rank_14_optim_states.pt
[2021-11-02 02:49:56,497] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_1_mp_rank_01_optim_states.pt
[2021-11-02 02:49:56,497] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_2_mp_rank_01_optim_states.pt
[2021-11-02 02:49:56,500] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_2_mp_rank_13_optim_states.pt
[2021-11-02 02:49:56,504] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_2_mp_rank_00_optim_states.pt
[2021-11-02 02:49:56,513] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_3_mp_rank_14_optim_states.pt
[2021-11-02 02:49:56,513] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_0_mp_rank_02_optim_states.pt
[2021-11-02 02:49:56,514] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_0_mp_rank_13_optim_states.pt
[2021-11-02 02:49:56,514] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_1_mp_rank_15_optim_states.pt
[2021-11-02 02:49:56,518] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_2_mp_rank_12_optim_states.pt
[2021-11-02 02:49:56,524] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_1_mp_rank_00_optim_states.pt
[2021-11-02 02:49:56,524] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_2_mp_rank_03_optim_states.pt
[2021-11-02 02:49:56,527] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_3_mp_rank_01_optim_states.pt
[2021-11-02 02:49:56,528] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_2_mp_rank_02_optim_states.pt
[2021-11-02 02:49:56,528] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_0_mp_rank_12_optim_states.pt
[2021-11-02 02:49:56,529] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_2_mp_rank_15_optim_states.pt
[2021-11-02 02:49:56,530] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_3_mp_rank_13_optim_states.pt
[2021-11-02 02:49:56,532] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_1_mp_rank_13_optim_states.pt
[2021-11-02 02:49:56,535] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_3_mp_rank_00_optim_states.pt
[2021-11-02 02:49:56,535] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_1_mp_rank_03_optim_states.pt
[2021-11-02 02:49:56,565] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_0_mp_rank_00_optim_states.pt
  successfully saved checkpoint at iteration   13500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
time (ms) | save-checkpoint: 1128.47
 iteration    13600/  152972 | consumed samples:      1890880 | consumed tokens:   3872522240 | elapsed time per iteration (ms): 5483.7 | learning rate: 1.997E-04 | global batch size:   480 | lm loss: 2.318233E+00 | loss scale: 1048576.0 | grad norm: 93357.413 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    13800/  152972 | consumed samples:      1986880 | consumed tokens:   4069130240 | elapsed time per iteration (ms): 5725.4 | learning rate: 1.997E-04 | global batch size:   480 | lm loss: 2.316036E+00 | loss scale: 262144.0 | grad norm: 25542.089 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
[2021-11-02 03:38:14,984] [INFO] [logging.py:68:log_dist] [Rank 0] step=14000, skipped=23, lr=[0.00019968259658442148, 0.00019968259658442148], mom=[(0.9, 0.999), (0.9, 0.999)]
 iteration    14000/  152972 | consumed samples:      2088384 | consumed tokens:   4277010432 | elapsed time per iteration (ms): 5984.3 | learning rate: 1.997E-04 | global batch size:   512 | lm loss: 2.304354E+00 | loss scale: 262144.0 | grad norm: 22072.428 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
steps: 14000 loss: 2.1247 iter time (s): 0.003 samples/sec: 169808.536
-------------------------------------------------------------------------------------------------
 validation loss at iteration 14000 | lm loss value: 2.287617E+00 | lm loss PPL: 9.851433E+00 | 
-------------------------------------------------------------------------------------------------
 iteration    14200/  152972 | consumed samples:      2190784 | consumed tokens:   4486725632 | elapsed time per iteration (ms): 7030.1 | learning rate: 1.996E-04 | global batch size:   512 | lm loss: 3.456157E+00 | loss scale: 16384.0 | grad norm: 27750.069 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    14400/  152972 | consumed samples:      2293184 | consumed tokens:   4696440832 | elapsed time per iteration (ms): 6019.1 | learning rate: 1.996E-04 | global batch size:   512 | lm loss: 2.471888E+00 | loss scale: 16384.0 | grad norm: 1553.035 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    14600/  152972 | consumed samples:      2395584 | consumed tokens:   4906156032 | elapsed time per iteration (ms): 6023.3 | learning rate: 1.996E-04 | global batch size:   512 | lm loss: 2.308169E+00 | loss scale: 16384.0 | grad norm: 1517.946 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    14800/  152972 | consumed samples:      2497984 | consumed tokens:   5115871232 | elapsed time per iteration (ms): 6021.9 | learning rate: 1.995E-04 | global batch size:   512 | lm loss: 2.292671E+00 | loss scale: 32768.0 | grad norm: 3138.197 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    15000/  152972 | consumed samples:      2600384 | consumed tokens:   5325586432 | elapsed time per iteration (ms): 6013.2 | learning rate: 1.995E-04 | global batch size:   512 | lm loss: 2.295139E+00 | loss scale: 32768.0 | grad norm: 3061.595 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
-------------------------------------------------------------------------------------------------
 validation loss at iteration 15000 | lm loss value: 2.267282E+00 | lm loss PPL: 9.653128E+00 | 
-------------------------------------------------------------------------------------------------
saving checkpoint at iteration   15000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
[2021-11-02 05:25:19,020] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/mp_rank_00_model_states.pt
[2021-11-02 05:25:19,065] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/mp_rank_01_model_states.pt
[2021-11-02 05:25:19,454] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_0_mp_rank_10_optim_states.pt
[2021-11-02 05:25:19,456] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_3_mp_rank_09_optim_states.pt
[2021-11-02 05:25:19,459] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_0_mp_rank_05_optim_states.pt
[2021-11-02 05:25:19,460] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_2_mp_rank_07_optim_states.pt
[2021-11-02 05:25:19,461] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_2_mp_rank_05_optim_states.pt
[2021-11-02 05:25:19,463] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_0_mp_rank_06_optim_states.pt
[2021-11-02 05:25:19,464] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_1_mp_rank_10_optim_states.pt
[2021-11-02 05:25:19,464] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_3_mp_rank_07_optim_states.pt
[2021-11-02 05:25:19,465] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_1_mp_rank_11_optim_states.pt
[2021-11-02 05:25:19,466] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_3_mp_rank_10_optim_states.pt
[2021-11-02 05:25:19,467] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_0_mp_rank_08_optim_states.pt
[2021-11-02 05:25:19,467] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_1_mp_rank_04_optim_states.pt
[2021-11-02 05:25:19,469] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_3_mp_rank_05_optim_states.pt
[2021-11-02 05:25:19,469] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_2_mp_rank_08_optim_states.pt
[2021-11-02 05:25:19,469] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_2_mp_rank_09_optim_states.pt
[2021-11-02 05:25:19,483] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_0_mp_rank_09_optim_states.pt
[2021-11-02 05:25:19,483] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_2_mp_rank_11_optim_states.pt
[2021-11-02 05:25:19,484] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_1_mp_rank_09_optim_states.pt
[2021-11-02 05:25:19,486] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_0_mp_rank_07_optim_states.pt
[2021-11-02 05:25:19,487] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_3_mp_rank_04_optim_states.pt
[2021-11-02 05:25:19,488] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_3_mp_rank_11_optim_states.pt
[2021-11-02 05:25:19,489] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_1_mp_rank_05_optim_states.pt
[2021-11-02 05:25:19,490] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_2_mp_rank_10_optim_states.pt
[2021-11-02 05:25:19,490] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_1_mp_rank_06_optim_states.pt
[2021-11-02 05:25:19,490] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_2_mp_rank_06_optim_states.pt
[2021-11-02 05:25:19,491] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_0_mp_rank_04_optim_states.pt
[2021-11-02 05:25:19,491] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_2_mp_rank_04_optim_states.pt
[2021-11-02 05:25:19,492] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_0_mp_rank_11_optim_states.pt
[2021-11-02 05:25:19,492] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_1_mp_rank_08_optim_states.pt
[2021-11-02 05:25:19,493] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_3_mp_rank_06_optim_states.pt
[2021-11-02 05:25:19,493] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_1_mp_rank_07_optim_states.pt
[2021-11-02 05:25:19,494] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_3_mp_rank_08_optim_states.pt
[2021-11-02 05:25:19,589] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_2_mp_rank_03_optim_states.pt
[2021-11-02 05:25:19,593] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_3_mp_rank_02_optim_states.pt
[2021-11-02 05:25:19,593] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_2_mp_rank_14_optim_states.pt
[2021-11-02 05:25:19,594] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_1_mp_rank_03_optim_states.pt
[2021-11-02 05:25:19,594] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_3_mp_rank_13_optim_states.pt
[2021-11-02 05:25:19,596] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_0_mp_rank_00_optim_states.pt
[2021-11-02 05:25:19,596] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_3_mp_rank_00_optim_states.pt
[2021-11-02 05:25:19,598] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_3_mp_rank_14_optim_states.pt
[2021-11-02 05:25:19,599] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_0_mp_rank_03_optim_states.pt
[2021-11-02 05:25:19,600] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_1_mp_rank_00_optim_states.pt
[2021-11-02 05:25:19,601] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_0_mp_rank_12_optim_states.pt
[2021-11-02 05:25:19,602] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_0_mp_rank_01_optim_states.pt
[2021-11-02 05:25:19,602] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_1_mp_rank_13_optim_states.pt
[2021-11-02 05:25:19,603] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_1_mp_rank_14_optim_states.pt
[2021-11-02 05:25:19,607] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_2_mp_rank_00_optim_states.pt
[2021-11-02 05:25:19,608] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_0_mp_rank_13_optim_states.pt
[2021-11-02 05:25:19,619] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_0_mp_rank_02_optim_states.pt
[2021-11-02 05:25:19,620] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_1_mp_rank_01_optim_states.pt
[2021-11-02 05:25:19,623] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_0_mp_rank_14_optim_states.pt
[2021-11-02 05:25:19,628] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_3_mp_rank_15_optim_states.pt
[2021-11-02 05:25:19,628] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_1_mp_rank_02_optim_states.pt
[2021-11-02 05:25:19,629] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_2_mp_rank_01_optim_states.pt
[2021-11-02 05:25:19,632] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_3_mp_rank_03_optim_states.pt
[2021-11-02 05:25:19,632] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_2_mp_rank_13_optim_states.pt
[2021-11-02 05:25:19,636] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_1_mp_rank_12_optim_states.pt
[2021-11-02 05:25:19,638] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_3_mp_rank_01_optim_states.pt
[2021-11-02 05:25:19,638] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_2_mp_rank_15_optim_states.pt
[2021-11-02 05:25:19,639] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_2_mp_rank_02_optim_states.pt
[2021-11-02 05:25:19,642] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_0_mp_rank_15_optim_states.pt
[2021-11-02 05:25:19,643] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_3_mp_rank_12_optim_states.pt
[2021-11-02 05:25:19,643] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_1_mp_rank_15_optim_states.pt
[2021-11-02 05:25:19,643] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_2_mp_rank_12_optim_states.pt
  successfully saved checkpoint at iteration   15000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
time (ms) | save-checkpoint: 1170.78
 iteration    15200/  152972 | consumed samples:      2702784 | consumed tokens:   5535301632 | elapsed time per iteration (ms): 7027.8 | learning rate: 1.994E-04 | global batch size:   512 | lm loss: 2.293391E+00 | loss scale: 65536.0 | grad norm: 5879.056 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    15400/  152972 | consumed samples:      2805184 | consumed tokens:   5745016832 | elapsed time per iteration (ms): 6020.1 | learning rate: 1.994E-04 | global batch size:   512 | lm loss: 2.270763E+00 | loss scale: 65536.0 | grad norm: 6939.875 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    15600/  152972 | consumed samples:      2907584 | consumed tokens:   5954732032 | elapsed time per iteration (ms): 6011.7 | learning rate: 1.994E-04 | global batch size:   512 | lm loss: 2.262162E+00 | loss scale: 65536.0 | grad norm: 5802.582 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    15800/  152972 | consumed samples:      3009984 | consumed tokens:   6164447232 | elapsed time per iteration (ms): 5994.6 | learning rate: 1.993E-04 | global batch size:   512 | lm loss: 2.254617E+00 | loss scale: 131072.0 | grad norm: 12598.649 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
[2021-11-02 07:05:26,628] [INFO] [logging.py:68:log_dist] [Rank 0] step=16000, skipped=27, lr=[0.00019925032117609708, 0.00019925032117609708], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 16000 loss: 2.1199 iter time (s): 0.003 samples/sec: 171120.084
 iteration    16000/  152972 | consumed samples:      3112384 | consumed tokens:   6374162432 | elapsed time per iteration (ms): 5996.5 | learning rate: 1.993E-04 | global batch size:   512 | lm loss: 2.264790E+00 | loss scale: 131072.0 | grad norm: 11840.026 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
-------------------------------------------------------------------------------------------------
 validation loss at iteration 16000 | lm loss value: 2.238354E+00 | lm loss PPL: 9.377885E+00 | 
-------------------------------------------------------------------------------------------------
 iteration    16200/  152972 | consumed samples:      3214784 | consumed tokens:   6583877632 | elapsed time per iteration (ms): 7006.8 | learning rate: 1.992E-04 | global batch size:   512 | lm loss: 2.260806E+00 | loss scale: 262144.0 | grad norm: 26196.926 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    16400/  152972 | consumed samples:      3317184 | consumed tokens:   6793592832 | elapsed time per iteration (ms): 6025.1 | learning rate: 1.991E-04 | global batch size:   512 | lm loss: 2.253266E+00 | loss scale: 262144.0 | grad norm: 22496.516 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
saving checkpoint at iteration   16500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
[2021-11-02 07:58:57,103] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/mp_rank_00_model_states.pt
[2021-11-02 07:58:57,152] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/mp_rank_01_model_states.pt
[2021-11-02 07:58:57,508] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_0_mp_rank_10_optim_states.pt
[2021-11-02 07:58:57,516] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_3_mp_rank_10_optim_states.pt
[2021-11-02 07:58:57,518] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_0_mp_rank_05_optim_states.pt
[2021-11-02 07:58:57,518] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_0_mp_rank_08_optim_states.pt
[2021-11-02 07:58:57,518] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_1_mp_rank_10_optim_states.pt
[2021-11-02 07:58:57,519] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_1_mp_rank_04_optim_states.pt
[2021-11-02 07:58:57,519] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_3_mp_rank_08_optim_states.pt
[2021-11-02 07:58:57,520] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_0_mp_rank_06_optim_states.pt
[2021-11-02 07:58:57,520] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_2_mp_rank_05_optim_states.pt
[2021-11-02 07:58:57,522] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_1_mp_rank_11_optim_states.pt
[2021-11-02 07:58:57,524] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_2_mp_rank_10_optim_states.pt
[2021-11-02 07:58:57,524] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_1_mp_rank_05_optim_states.pt
[2021-11-02 07:58:57,524] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_3_mp_rank_04_optim_states.pt
[2021-11-02 07:58:57,525] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_2_mp_rank_08_optim_states.pt
[2021-11-02 07:58:57,525] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_2_mp_rank_07_optim_states.pt
[2021-11-02 07:58:57,537] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_3_mp_rank_11_optim_states.pt
[2021-11-02 07:58:57,538] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_3_mp_rank_05_optim_states.pt
[2021-11-02 07:58:57,540] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_0_mp_rank_04_optim_states.pt
[2021-11-02 07:58:57,540] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_1_mp_rank_09_optim_states.pt
[2021-11-02 07:58:57,541] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_1_mp_rank_08_optim_states.pt
[2021-11-02 07:58:57,541] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_0_mp_rank_11_optim_states.pt
[2021-11-02 07:58:57,542] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_3_mp_rank_07_optim_states.pt
[2021-11-02 07:58:57,543] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_0_mp_rank_09_optim_states.pt
[2021-11-02 07:58:57,544] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_0_mp_rank_07_optim_states.pt
[2021-11-02 07:58:57,545] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_2_mp_rank_11_optim_states.pt
[2021-11-02 07:58:57,547] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_3_mp_rank_06_optim_states.pt
[2021-11-02 07:58:57,547] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_1_mp_rank_07_optim_states.pt
[2021-11-02 07:58:57,549] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_1_mp_rank_06_optim_states.pt
[2021-11-02 07:58:57,549] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_2_mp_rank_06_optim_states.pt
[2021-11-02 07:58:57,550] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_3_mp_rank_09_optim_states.pt
[2021-11-02 07:58:57,553] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_2_mp_rank_09_optim_states.pt
[2021-11-02 07:58:57,555] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_2_mp_rank_04_optim_states.pt
[2021-11-02 07:58:57,640] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_3_mp_rank_01_optim_states.pt
[2021-11-02 07:58:57,641] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_0_mp_rank_00_optim_states.pt
[2021-11-02 07:58:57,644] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_3_mp_rank_12_optim_states.pt
[2021-11-02 07:58:57,645] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_1_mp_rank_03_optim_states.pt
[2021-11-02 07:58:57,647] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_3_mp_rank_03_optim_states.pt
[2021-11-02 07:58:57,651] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_0_mp_rank_14_optim_states.pt
[2021-11-02 07:58:57,654] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_1_mp_rank_00_optim_states.pt
[2021-11-02 07:58:57,655] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_2_mp_rank_15_optim_states.pt
[2021-11-02 07:58:57,657] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_1_mp_rank_15_optim_states.pt
[2021-11-02 07:58:57,658] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_0_mp_rank_12_optim_states.pt
[2021-11-02 07:58:57,659] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_3_mp_rank_13_optim_states.pt
[2021-11-02 07:58:57,662] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_1_mp_rank_13_optim_states.pt
[2021-11-02 07:58:57,662] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_2_mp_rank_12_optim_states.pt
[2021-11-02 07:58:57,663] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_2_mp_rank_00_optim_states.pt
[2021-11-02 07:58:57,664] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_2_mp_rank_01_optim_states.pt
[2021-11-02 07:58:57,676] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_2_mp_rank_13_optim_states.pt
[2021-11-02 07:58:57,679] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_3_mp_rank_15_optim_states.pt
[2021-11-02 07:58:57,680] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_1_mp_rank_01_optim_states.pt
[2021-11-02 07:58:57,680] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_1_mp_rank_14_optim_states.pt
[2021-11-02 07:58:57,682] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_3_mp_rank_02_optim_states.pt
[2021-11-02 07:58:57,683] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_0_mp_rank_15_optim_states.pt
[2021-11-02 07:58:57,684] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_0_mp_rank_13_optim_states.pt
[2021-11-02 07:58:57,685] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_0_mp_rank_01_optim_states.pt
[2021-11-02 07:58:57,686] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_0_mp_rank_02_optim_states.pt
[2021-11-02 07:58:57,686] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_2_mp_rank_14_optim_states.pt
[2021-11-02 07:58:57,690] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_1_mp_rank_12_optim_states.pt
[2021-11-02 07:58:57,690] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_1_mp_rank_02_optim_states.pt
[2021-11-02 07:58:57,691] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_0_mp_rank_03_optim_states.pt
[2021-11-02 07:58:57,692] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_3_mp_rank_14_optim_states.pt
[2021-11-02 07:58:57,694] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_2_mp_rank_02_optim_states.pt
[2021-11-02 07:58:57,694] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_3_mp_rank_00_optim_states.pt
[2021-11-02 07:58:57,697] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_2_mp_rank_03_optim_states.pt
  successfully saved checkpoint at iteration   16500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
time (ms) | save-checkpoint: 1087.73
 iteration    16600/  152972 | consumed samples:      3419584 | consumed tokens:   7003308032 | elapsed time per iteration (ms): 6040.9 | learning rate: 1.991E-04 | global batch size:   512 | lm loss: 2.247239E+00 | loss scale: 262144.0 | grad norm: 24826.245 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    16800/  152972 | consumed samples:      3521984 | consumed tokens:   7213023232 | elapsed time per iteration (ms): 6023.2 | learning rate: 1.990E-04 | global batch size:   512 | lm loss: 2.242061E+00 | loss scale: 524288.0 | grad norm: 43277.499 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    17000/  152972 | consumed samples:      3624384 | consumed tokens:   7422738432 | elapsed time per iteration (ms): 6021.0 | learning rate: 1.990E-04 | global batch size:   512 | lm loss: 2.212666E+00 | loss scale: 524288.0 | grad norm: 53420.714 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
-------------------------------------------------------------------------------------------------
 validation loss at iteration 17000 | lm loss value: 2.208052E+00 | lm loss PPL: 9.097976E+00 | 
-------------------------------------------------------------------------------------------------
 iteration    17200/  152972 | consumed samples:      3726784 | consumed tokens:   7632453632 | elapsed time per iteration (ms): 7048.9 | learning rate: 1.989E-04 | global batch size:   512 | lm loss: 2.228615E+00 | loss scale: 1048576.0 | grad norm: 92680.708 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    17400/  152972 | consumed samples:      3829184 | consumed tokens:   7842168832 | elapsed time per iteration (ms): 6018.9 | learning rate: 1.988E-04 | global batch size:   512 | lm loss: 2.239331E+00 | loss scale: 1048576.0 | grad norm: 94849.260 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    17600/  152972 | consumed samples:      3931584 | consumed tokens:   8051884032 | elapsed time per iteration (ms): 5998.8 | learning rate: 1.988E-04 | global batch size:   512 | lm loss: 2.226987E+00 | loss scale: 1048576.0 | grad norm: 100139.329 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    17800/  152972 | consumed samples:      4033984 | consumed tokens:   8261599232 | elapsed time per iteration (ms): 6035.2 | learning rate: 1.987E-04 | global batch size:   512 | lm loss: 2.213699E+00 | loss scale: 1048576.0 | grad norm: 87903.871 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
[2021-11-02 10:32:56,352] [INFO] [logging.py:68:log_dist] [Rank 0] step=18000, skipped=28, lr=[0.000198635005451171, 0.000198635005451171], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 18000 loss: 2.1605 iter time (s): 0.003 samples/sec: 170850.458
 iteration    18000/  152972 | consumed samples:      4136384 | consumed tokens:   8471314432 | elapsed time per iteration (ms): 6029.8 | learning rate: 1.986E-04 | global batch size:   512 | lm loss: 2.212738E+00 | loss scale: 1048576.0 | grad norm: 86888.361 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
-------------------------------------------------------------------------------------------------
 validation loss at iteration 18000 | lm loss value: 2.182227E+00 | lm loss PPL: 8.866028E+00 | 
-------------------------------------------------------------------------------------------------
saving checkpoint at iteration   18000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
[2021-11-02 10:36:24,657] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/mp_rank_01_model_states.pt
[2021-11-02 10:36:24,690] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/mp_rank_00_model_states.pt
[2021-11-02 10:36:25,043] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_1_mp_rank_10_optim_states.pt
[2021-11-02 10:36:25,043] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_2_mp_rank_08_optim_states.pt
[2021-11-02 10:36:25,050] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_0_mp_rank_10_optim_states.pt
[2021-11-02 10:36:25,050] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_0_mp_rank_08_optim_states.pt
[2021-11-02 10:36:25,051] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_3_mp_rank_10_optim_states.pt
[2021-11-02 10:36:25,055] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_0_mp_rank_06_optim_states.pt
[2021-11-02 10:36:25,055] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_2_mp_rank_05_optim_states.pt
[2021-11-02 10:36:25,056] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_2_mp_rank_10_optim_states.pt
[2021-11-02 10:36:25,056] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_2_mp_rank_07_optim_states.pt
[2021-11-02 10:36:25,056] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_3_mp_rank_11_optim_states.pt
[2021-11-02 10:36:25,057] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_0_mp_rank_07_optim_states.pt
[2021-11-02 10:36:25,057] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_1_mp_rank_11_optim_states.pt
[2021-11-02 10:36:25,057] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_1_mp_rank_06_optim_states.pt
[2021-11-02 10:36:25,058] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_1_mp_rank_04_optim_states.pt
[2021-11-02 10:36:25,070] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_3_mp_rank_05_optim_states.pt
[2021-11-02 10:36:25,076] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_2_mp_rank_06_optim_states.pt
[2021-11-02 10:36:25,076] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_3_mp_rank_04_optim_states.pt
[2021-11-02 10:36:25,077] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_1_mp_rank_05_optim_states.pt
[2021-11-02 10:36:25,078] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_1_mp_rank_09_optim_states.pt
[2021-11-02 10:36:25,078] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_0_mp_rank_09_optim_states.pt
[2021-11-02 10:36:25,080] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_1_mp_rank_07_optim_states.pt
[2021-11-02 10:36:25,080] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_2_mp_rank_11_optim_states.pt
[2021-11-02 10:36:25,082] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_1_mp_rank_08_optim_states.pt
[2021-11-02 10:36:25,084] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_2_mp_rank_09_optim_states.pt
[2021-11-02 10:36:25,084] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_0_mp_rank_05_optim_states.pt
[2021-11-02 10:36:25,086] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_0_mp_rank_04_optim_states.pt
[2021-11-02 10:36:25,086] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_3_mp_rank_06_optim_states.pt
[2021-11-02 10:36:25,088] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_3_mp_rank_07_optim_states.pt
[2021-11-02 10:36:25,088] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_3_mp_rank_09_optim_states.pt
[2021-11-02 10:36:25,088] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_0_mp_rank_11_optim_states.pt
[2021-11-02 10:36:25,089] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_2_mp_rank_04_optim_states.pt
[2021-11-02 10:36:25,092] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_3_mp_rank_08_optim_states.pt
[2021-11-02 10:36:25,176] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_2_mp_rank_03_optim_states.pt
[2021-11-02 10:36:25,183] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_1_mp_rank_12_optim_states.pt
[2021-11-02 10:36:25,185] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_3_mp_rank_02_optim_states.pt
[2021-11-02 10:36:25,186] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_3_mp_rank_15_optim_states.pt
[2021-11-02 10:36:25,186] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_3_mp_rank_13_optim_states.pt
[2021-11-02 10:36:25,189] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_1_mp_rank_14_optim_states.pt
[2021-11-02 10:36:25,191] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_0_mp_rank_13_optim_states.pt
[2021-11-02 10:36:25,191] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_0_mp_rank_02_optim_states.pt
[2021-11-02 10:36:25,191] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_0_mp_rank_03_optim_states.pt
[2021-11-02 10:36:25,192] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_2_mp_rank_14_optim_states.pt
[2021-11-02 10:36:25,192] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_3_mp_rank_03_optim_states.pt
[2021-11-02 10:36:25,192] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_1_mp_rank_02_optim_states.pt
[2021-11-02 10:36:25,197] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_2_mp_rank_13_optim_states.pt
[2021-11-02 10:36:25,207] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_2_mp_rank_01_optim_states.pt
[2021-11-02 10:36:25,210] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_1_mp_rank_01_optim_states.pt
[2021-11-02 10:36:25,219] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_3_mp_rank_12_optim_states.pt
[2021-11-02 10:36:25,221] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_2_mp_rank_12_optim_states.pt
[2021-11-02 10:36:25,222] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_3_mp_rank_14_optim_states.pt
[2021-11-02 10:36:25,223] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_0_mp_rank_01_optim_states.pt
[2021-11-02 10:36:25,224] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_0_mp_rank_12_optim_states.pt
[2021-11-02 10:36:25,227] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_2_mp_rank_15_optim_states.pt
[2021-11-02 10:36:25,227] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_3_mp_rank_00_optim_states.pt
[2021-11-02 10:36:25,227] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_1_mp_rank_00_optim_states.pt
[2021-11-02 10:36:25,230] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_0_mp_rank_14_optim_states.pt
[2021-11-02 10:36:25,233] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_3_mp_rank_01_optim_states.pt
[2021-11-02 10:36:25,236] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_1_mp_rank_15_optim_states.pt
[2021-11-02 10:36:25,236] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_0_mp_rank_15_optim_states.pt
[2021-11-02 10:36:25,237] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_2_mp_rank_00_optim_states.pt
[2021-11-02 10:36:25,241] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_1_mp_rank_13_optim_states.pt
[2021-11-02 10:36:25,244] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_2_mp_rank_02_optim_states.pt
[2021-11-02 10:36:25,245] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_1_mp_rank_03_optim_states.pt
[2021-11-02 10:36:25,254] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_0_mp_rank_00_optim_states.pt
  successfully saved checkpoint at iteration   18000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
time (ms) | save-checkpoint: 1114.76
 iteration    18200/  152972 | consumed samples:      4238784 | consumed tokens:   8681029632 | elapsed time per iteration (ms): 7066.9 | learning rate: 1.986E-04 | global batch size:   512 | lm loss: 2.219932E+00 | loss scale: 2097152.0 | grad norm: 195940.206 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    18400/  152972 | consumed samples:      4341184 | consumed tokens:   8890744832 | elapsed time per iteration (ms): 6037.0 | learning rate: 1.985E-04 | global batch size:   512 | lm loss: 2.213413E+00 | loss scale: 524288.0 | grad norm: 41907.289 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    18600/  152972 | consumed samples:      4443584 | consumed tokens:   9100460032 | elapsed time per iteration (ms): 6015.2 | learning rate: 1.984E-04 | global batch size:   512 | lm loss: 2.201817E+00 | loss scale: 524288.0 | grad norm: 45378.544 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    18800/  152972 | consumed samples:      4545984 | consumed tokens:   9310175232 | elapsed time per iteration (ms): 6018.8 | learning rate: 1.983E-04 | global batch size:   512 | lm loss: 2.199431E+00 | loss scale: 1048576.0 | grad norm: 97265.681 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    19000/  152972 | consumed samples:      4648384 | consumed tokens:   9519890432 | elapsed time per iteration (ms): 6017.8 | learning rate: 1.983E-04 | global batch size:   512 | lm loss: 2.184064E+00 | loss scale: 1048576.0 | grad norm: 84813.184 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
-------------------------------------------------------------------------------------------------
 validation loss at iteration 19000 | lm loss value: 2.160727E+00 | lm loss PPL: 8.677444E+00 | 
-------------------------------------------------------------------------------------------------
 iteration    19200/  152972 | consumed samples:      4750784 | consumed tokens:   9729605632 | elapsed time per iteration (ms): 7048.4 | learning rate: 1.982E-04 | global batch size:   512 | lm loss: 2.202084E+00 | loss scale: 1048576.0 | grad norm: 87504.403 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    19400/  152972 | consumed samples:      4853184 | consumed tokens:   9939320832 | elapsed time per iteration (ms): 6028.4 | learning rate: 1.981E-04 | global batch size:   512 | lm loss: 2.198836E+00 | loss scale: 1048576.0 | grad norm: 93010.671 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
saving checkpoint at iteration   19500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
[2021-11-02 13:10:25,906] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/mp_rank_00_model_states.pt
[2021-11-02 13:10:25,944] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/mp_rank_01_model_states.pt
[2021-11-02 13:10:26,326] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_2_mp_rank_08_optim_states.pt
[2021-11-02 13:10:26,331] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_0_mp_rank_10_optim_states.pt
[2021-11-02 13:10:26,334] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_2_mp_rank_05_optim_states.pt
[2021-11-02 13:10:26,334] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_2_mp_rank_04_optim_states.pt
[2021-11-02 13:10:26,336] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_3_mp_rank_10_optim_states.pt
[2021-11-02 13:10:26,337] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_1_mp_rank_11_optim_states.pt
[2021-11-02 13:10:26,337] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_1_mp_rank_09_optim_states.pt
[2021-11-02 13:10:26,338] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_0_mp_rank_05_optim_states.pt
[2021-11-02 13:10:26,338] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_3_mp_rank_11_optim_states.pt
[2021-11-02 13:10:26,340] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_0_mp_rank_07_optim_states.pt
[2021-11-02 13:10:26,343] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_3_mp_rank_05_optim_states.pt
[2021-11-02 13:10:26,344] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_0_mp_rank_08_optim_states.pt
[2021-11-02 13:10:26,350] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_2_mp_rank_10_optim_states.pt
[2021-11-02 13:10:26,355] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_3_mp_rank_04_optim_states.pt
[2021-11-02 13:10:26,356] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_1_mp_rank_05_optim_states.pt
[2021-11-02 13:10:26,357] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_0_mp_rank_06_optim_states.pt
[2021-11-02 13:10:26,357] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_0_mp_rank_09_optim_states.pt
[2021-11-02 13:10:26,362] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_1_mp_rank_04_optim_states.pt
[2021-11-02 13:10:26,362] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_2_mp_rank_11_optim_states.pt
[2021-11-02 13:10:26,363] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_1_mp_rank_08_optim_states.pt
[2021-11-02 13:10:26,363] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_0_mp_rank_11_optim_states.pt
[2021-11-02 13:10:26,364] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_1_mp_rank_10_optim_states.pt
[2021-11-02 13:10:26,367] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_3_mp_rank_06_optim_states.pt
[2021-11-02 13:10:26,368] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_3_mp_rank_07_optim_states.pt
[2021-11-02 13:10:26,368] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_2_mp_rank_07_optim_states.pt
[2021-11-02 13:10:26,369] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_2_mp_rank_06_optim_states.pt
[2021-11-02 13:10:26,372] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_0_mp_rank_04_optim_states.pt
[2021-11-02 13:10:26,372] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_3_mp_rank_08_optim_states.pt
[2021-11-02 13:10:26,373] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_2_mp_rank_09_optim_states.pt
[2021-11-02 13:10:26,373] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_3_mp_rank_09_optim_states.pt
[2021-11-02 13:10:26,383] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_1_mp_rank_07_optim_states.pt
[2021-11-02 13:10:26,383] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_1_mp_rank_06_optim_states.pt
[2021-11-02 13:10:26,455] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_3_mp_rank_01_optim_states.pt
[2021-11-02 13:10:26,457] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_2_mp_rank_03_optim_states.pt
[2021-11-02 13:10:26,458] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_0_mp_rank_01_optim_states.pt
[2021-11-02 13:10:26,459] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_3_mp_rank_00_optim_states.pt
[2021-11-02 13:10:26,466] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_0_mp_rank_14_optim_states.pt
[2021-11-02 13:10:26,469] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_0_mp_rank_03_optim_states.pt
[2021-11-02 13:10:26,469] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_3_mp_rank_12_optim_states.pt
[2021-11-02 13:10:26,469] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_0_mp_rank_00_optim_states.pt
[2021-11-02 13:10:26,471] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_2_mp_rank_15_optim_states.pt
[2021-11-02 13:10:26,472] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_1_mp_rank_13_optim_states.pt
[2021-11-02 13:10:26,475] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_1_mp_rank_02_optim_states.pt
[2021-11-02 13:10:26,476] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_0_mp_rank_12_optim_states.pt
[2021-11-02 13:10:26,477] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_1_mp_rank_15_optim_states.pt
[2021-11-02 13:10:26,479] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_3_mp_rank_15_optim_states.pt
[2021-11-02 13:10:26,481] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_3_mp_rank_14_optim_states.pt
[2021-11-02 13:10:26,483] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_1_mp_rank_01_optim_states.pt
[2021-11-02 13:10:26,488] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_2_mp_rank_00_optim_states.pt
[2021-11-02 13:10:26,502] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_0_mp_rank_02_optim_states.pt
[2021-11-02 13:10:26,502] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_1_mp_rank_14_optim_states.pt
[2021-11-02 13:10:26,505] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_2_mp_rank_12_optim_states.pt
[2021-11-02 13:10:26,510] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_0_mp_rank_13_optim_states.pt
[2021-11-02 13:10:26,510] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_0_mp_rank_15_optim_states.pt
[2021-11-02 13:10:26,512] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_3_mp_rank_13_optim_states.pt
[2021-11-02 13:10:26,512] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_1_mp_rank_03_optim_states.pt
[2021-11-02 13:10:26,513] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_1_mp_rank_12_optim_states.pt
[2021-11-02 13:10:26,513] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_1_mp_rank_00_optim_states.pt
[2021-11-02 13:10:26,515] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_2_mp_rank_13_optim_states.pt
[2021-11-02 13:10:26,516] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_3_mp_rank_03_optim_states.pt
[2021-11-02 13:10:26,516] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_2_mp_rank_02_optim_states.pt
[2021-11-02 13:10:26,516] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_2_mp_rank_14_optim_states.pt
[2021-11-02 13:10:26,517] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_3_mp_rank_02_optim_states.pt
[2021-11-02 13:10:26,518] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_2_mp_rank_01_optim_states.pt
  successfully saved checkpoint at iteration   19500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
time (ms) | save-checkpoint: 1209.82
 iteration    19600/  152972 | consumed samples:      4955584 | consumed tokens:  10149036032 | elapsed time per iteration (ms): 6042.6 | learning rate: 1.980E-04 | global batch size:   512 | lm loss: 2.169471E+00 | loss scale: 524288.0 | grad norm: 46329.081 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
saving checkpoint at iteration   19679 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
[2021-11-02 13:28:28,379] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/mp_rank_00_model_states.pt
[2021-11-02 13:28:28,423] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/mp_rank_01_model_states.pt
[2021-11-02 13:28:28,776] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_3_mp_rank_11_optim_states.pt
[2021-11-02 13:28:28,782] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_2_mp_rank_04_optim_states.pt
[2021-11-02 13:28:28,782] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_1_mp_rank_09_optim_states.pt
[2021-11-02 13:28:28,783] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_3_mp_rank_10_optim_states.pt
[2021-11-02 13:28:28,784] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_0_mp_rank_05_optim_states.pt
[2021-11-02 13:28:28,784] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_2_mp_rank_05_optim_states.pt
[2021-11-02 13:28:28,785] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_2_mp_rank_08_optim_states.pt
[2021-11-02 13:28:28,786] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_0_mp_rank_10_optim_states.pt
[2021-11-02 13:28:28,787] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_2_mp_rank_10_optim_states.pt
[2021-11-02 13:28:28,789] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_1_mp_rank_05_optim_states.pt
[2021-11-02 13:28:28,790] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_3_mp_rank_05_optim_states.pt
[2021-11-02 13:28:28,791] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_0_mp_rank_08_optim_states.pt
[2021-11-02 13:28:28,792] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_1_mp_rank_04_optim_states.pt
[2021-11-02 13:28:28,792] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_1_mp_rank_11_optim_states.pt
[2021-11-02 13:28:28,800] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_0_mp_rank_04_optim_states.pt
[2021-11-02 13:28:28,804] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_3_mp_rank_04_optim_states.pt
[2021-11-02 13:28:28,805] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_2_mp_rank_11_optim_states.pt
[2021-11-02 13:28:28,808] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_0_mp_rank_06_optim_states.pt
[2021-11-02 13:28:28,812] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_0_mp_rank_09_optim_states.pt
[2021-11-02 13:28:28,812] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_0_mp_rank_11_optim_states.pt
[2021-11-02 13:28:28,813] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_1_mp_rank_10_optim_states.pt
[2021-11-02 13:28:28,816] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_1_mp_rank_08_optim_states.pt
[2021-11-02 13:28:28,816] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_3_mp_rank_07_optim_states.pt
[2021-11-02 13:28:28,816] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_0_mp_rank_07_optim_states.pt
[2021-11-02 13:28:28,817] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_1_mp_rank_06_optim_states.pt
[2021-11-02 13:28:28,820] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_3_mp_rank_06_optim_states.pt
[2021-11-02 13:28:28,823] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_2_mp_rank_06_optim_states.pt
[2021-11-02 13:28:28,823] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_1_mp_rank_07_optim_states.pt
[2021-11-02 13:28:28,824] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_2_mp_rank_07_optim_states.pt
[2021-11-02 13:28:28,824] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_2_mp_rank_09_optim_states.pt
[2021-11-02 13:28:28,827] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_3_mp_rank_08_optim_states.pt
[2021-11-02 13:28:28,830] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_3_mp_rank_09_optim_states.pt
[2021-11-02 13:28:28,909] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_0_mp_rank_14_optim_states.pt
[2021-11-02 13:28:28,912] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_2_mp_rank_03_optim_states.pt
[2021-11-02 13:28:28,914] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_1_mp_rank_03_optim_states.pt
[2021-11-02 13:28:28,914] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_0_mp_rank_00_optim_states.pt
[2021-11-02 13:28:28,915] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_3_mp_rank_00_optim_states.pt
[2021-11-02 13:28:28,915] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_0_mp_rank_15_optim_states.pt
[2021-11-02 13:28:28,915] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_1_mp_rank_00_optim_states.pt
[2021-11-02 13:28:28,916] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_3_mp_rank_13_optim_states.pt
[2021-11-02 13:28:28,917] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_2_mp_rank_02_optim_states.pt
[2021-11-02 13:28:28,919] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_2_mp_rank_15_optim_states.pt
[2021-11-02 13:28:28,921] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_3_mp_rank_03_optim_states.pt
[2021-11-02 13:28:28,922] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_1_mp_rank_13_optim_states.pt
[2021-11-02 13:28:28,923] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_1_mp_rank_14_optim_states.pt
[2021-11-02 13:28:28,923] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_3_mp_rank_15_optim_states.pt
[2021-11-02 13:28:28,925] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_2_mp_rank_12_optim_states.pt
[2021-11-02 13:28:28,926] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_0_mp_rank_02_optim_states.pt
[2021-11-02 13:28:28,946] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_0_mp_rank_03_optim_states.pt
[2021-11-02 13:28:28,951] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_0_mp_rank_13_optim_states.pt
[2021-11-02 13:28:28,951] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_1_mp_rank_15_optim_states.pt
[2021-11-02 13:28:28,952] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_3_mp_rank_14_optim_states.pt
[2021-11-02 13:28:28,953] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_0_mp_rank_01_optim_states.pt
[2021-11-02 13:28:28,954] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_0_mp_rank_12_optim_states.pt
[2021-11-02 13:28:28,954] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_3_mp_rank_01_optim_states.pt
[2021-11-02 13:28:28,955] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_2_mp_rank_01_optim_states.pt
[2021-11-02 13:28:28,955] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_3_mp_rank_02_optim_states.pt
[2021-11-02 13:28:28,956] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_1_mp_rank_01_optim_states.pt
[2021-11-02 13:28:28,957] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_2_mp_rank_13_optim_states.pt
[2021-11-02 13:28:28,957] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_2_mp_rank_00_optim_states.pt
[2021-11-02 13:28:28,957] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_1_mp_rank_02_optim_states.pt
[2021-11-02 13:28:28,960] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_1_mp_rank_12_optim_states.pt
[2021-11-02 13:28:28,962] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_2_mp_rank_14_optim_states.pt
[2021-11-02 13:28:28,969] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_3_mp_rank_12_optim_states.pt
  successfully saved checkpoint at iteration   19679 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
time (ms) | save-checkpoint: 1159.77
[exiting program after 1190.0665838718414 minutes] datetime: 2021-11-02 13:28:29 
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------
JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------
JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report

--------------------------------------------------JIT compiled ops requires ninja

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------

--------------------------------------------------
JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------DeepSpeed C++/CUDA extension op report

JIT compiled ops requires ninja
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja--------------------------------------------------

DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
DeepSpeed C++/CUDA extension op report--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja

NOTE: Ops not installed will be just-in-time (JIT) compiled at
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op report--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------


JIT compiled ops requires ninja--------------------------------------------------
DeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op report
JIT compiled ops requires ninja
--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
JIT compiled ops requires ninja
      meet the required dependencies to JIT install the op.--------------------------------------------------

JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------DeepSpeed C++/CUDA extension op report

JIT compiled ops requires ninja--------------------------------------------------
----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

DeepSpeed C++/CUDA extension op report
--------------------------------------------------DeepSpeed C++/CUDA extension op report
--------------------------------------------------
JIT compiled ops requires ninja

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    .................................... ....................................[92m[OKAY][0m   
[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


----------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------op name
op name
ninjaninjaninja   ....................................ninja..................    [92m[OKAY][0m[92m[OKAY][0m..................[92m[OKAY][0m
  op name................................ op name  ................ installed installed................ installed  .. ..installed ..  compatible compatible..compatible

 
--------------------------------------------------compatible----------------------------------------------------------------------------------------------------


--------------------------------------------------
 
--------------------------------------------------
[92m[OKAY][0m
----------------------------------------------------------------------------------------------------

op name
cpu_adam cpu_adamcpu_adam............... cpu_adam ............... [92m[YES][0m ...............   [92m[YES][0m.....................[92m[YES][0m    ......[92m[YES][0m[92m[OKAY][0m......  
 [92m[OKAY][0m......[92m[OKAY][0m
 
[92m[OKAY][0m
 --------------------------------------------------op nameop name................
fused_adamfused_adam  .............fused_adamfused_adam.............    [92m[YES][0m..........................[92m[YES][0m    ......[92m[YES][0m[92m[YES][0m......    [92m[OKAY][0m............[92m[OKAY][0m
  
   installed................op name................    .................. installed  installedinstalledcompatible ..
[92m[OKAY][0m[92m[OKAY][0m

 .. --------------------------------------------------.. compatible
compatible
 
--------------------------------------------------compatible--------------------------------------------------


--------------------------------------------------
fused_lambfused_lamb  .............fused_lambfused_lamb .............  [92m[YES][0m .......................... [92m[YES][0m  ...... [92m[YES][0m [92m[YES][0m...... [92m[OKAY][0m  ......
......[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m

cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0mcpu_adamcpu_adam 
sparse_attn sparse_attn............  sparse_attn............[93m[NO][0m   ............sparse_attn[93m[NO][0m.......    [93m[NO][0m.......[92m[OKAY][0m............  
 .......[92m[OKAY][0m[93m[NO][0m 
 [92m[OKAY][0m.......
transformer transformer [92m[OKAY][0m ............transformer
cpu_adam ............... ............... ............... [92m[YES][0m [92m[YES][0m [92m[YES][0m ......fused_adam......    [92m[OKAY][0m...................[92m[OKAY][0m
............   [92m[YES][0m............[92m[YES][0mtransformer    [92m[YES][0m........................    ......[92m[OKAY][0m[92m[YES][0m[92m[OKAY][0m 
 
[92m[OKAY][0m......
  
[92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
 [92m[OKAY][0m
stochastic_transformerstochastic_transformer  stochastic_transformer..   .[92m[YES][0mstochastic_transformer[92m[YES][0m    [92m[YES][0m.............    [92m[OKAY][0m......[92m[YES][0m[92m[OKAY][0m
  
[92m[OKAY][0m......
 [92m[OKAY][0m
fused_adamfused_lamb  fused_adam............. fused_adam............. .............  [92m[YES][0m.............[92m[YES][0m    ......[92m[YES][0m[92m[YES][0m ...... [92m[OKAY][0m  
......[92m[OKAY][0m...... 
 [92m[OKAY][0mfused_lamb[92m[OKAY][0m

 ............. [92m[YES][0mfused_lamb fused_lamb ...... ............. sparse_attn............. [92m[OKAY][0m  [92m[YES][0m
............[92m[YES][0m   ......[93m[NO][0m......   [92m[OKAY][0m.......[92m[OKAY][0m

 [92m[OKAY][0m
sparse_attntransformer  ........................  [93m[NO][0m[92m[YES][0m sparse_attnsparse_attn  ....... .................. ............  [92m[OKAY][0m[92m[OKAY][0m [93m[NO][0m
ninjaninjaninjaninja    ...................................................... ..................  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


 [93m[NO][0m.......transformer stochastic_transformer.......    ............[92m[OKAY][0m. [92m[OKAY][0m
 [92m[YES][0m

------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------


[92m[YES][0m transformertransformer......    ..................[92m[OKAY][0m............ 
  [92m[YES][0m[92m[YES][0m[92m[OKAY][0m  
stochastic_transformer............   [92m[OKAY][0m[92m[OKAY][0m
op nameop name op name op name................  ................................ ................  installed installedinstalled installed ..  .. .. ..compatible compatible 
compatible
compatible
--------------------------------------------------
----------------------------------------------------------------------------------------------------

--------------------------------------------------

.
 [92m[YES][0m stochastic_transformer......  stochastic_transformer.[92m[OKAY][0m 
 .[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
cpu_adamcpu_adamcpu_adam   ............................................. cpu_adam  [92m[YES][0m [92m[YES][0m[92m[YES][0m ...............  ...... ............ [92m[YES][0m  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
......

 [92m[OKAY][0m
fused_adamfused_adam fused_adam ............. ............. ............. [92m[YES][0m [92m[YES][0m fused_adam[92m[YES][0m  ............ ......  ............. [92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m

[92m[YES][0m
 ...... [92m[OKAY][0mfused_lamb
 fused_lambfused_lamb.............   ..........................[92m[YES][0m   [92m[YES][0m[92m[YES][0m...... fused_lamb  ...... ......[92m[OKAY][0m ............. 
[92m[OKAY][0m [92m[OKAY][0m
[92m[YES][0m
 ...... [92m[OKAY][0m
sparse_attn sparse_attn............sparse_attn   ............[93m[NO][0m............   [93m[NO][0m....... [93m[NO][0msparse_attn .......  [92m[OKAY][0m ...................
 [92m[OKAY][0m [92m[OKAY][0m
[93m[NO][0m
transformer  .......transformer............ transformer  [92m[OKAY][0m ............[92m[YES][0m............
   [92m[YES][0m......[92m[YES][0m  transformer ......[92m[OKAY][0m ...... 
............ [92m[OKAY][0m [92m[OKAY][0m
[92m[YES][0m
 ......stochastic_transformer  stochastic_transformer[92m[OKAY][0m. 
stochastic_transformer . [92m[YES][0m . [92m[YES][0m ...... [92m[YES][0m stochastic_transformer...... [92m[OKAY][0m 
[92m[OKAY][0m ......
.  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
ninjaninjaninjaninja   ......................................................    [92m[OKAY][0m..................[92m[OKAY][0m[92m[OKAY][0m
 

[92m[OKAY][0m----------------------------------------------------------------------------------------------------
--------------------------------------------------


--------------------------------------------------op nameop nameop name
   ................op name................................    installed................installedinstalled    ..installed....    compatible..compatible
 compatible
compatible
----------------------------------------------------------------------------------------------------

--------------------------------------------------
--------------------------------------------------

cpu_adamcpu_adam  cpu_adam..............................cpu_adam   [92m[YES][0m ...............[92m[YES][0m ...............  ...... [92m[YES][0m...... [92m[YES][0m  [92m[OKAY][0m ......[92m[OKAY][0m
...... 
 [92m[OKAY][0m[92m[OKAY][0m

fused_adam fused_adam............. fused_adam.............   [92m[YES][0m[92m[YES][0m.............fused_adam    ......[92m[YES][0m...................    [92m[OKAY][0m[92m[OKAY][0m......[92m[YES][0m

  [92m[OKAY][0m......
 [92m[OKAY][0mfused_lamb
fused_lamb  .......................... fused_lamb [92m[YES][0m [92m[YES][0mfused_lamb.............    ............[92m[YES][0m.............    [92m[OKAY][0m[92m[OKAY][0m......[92m[YES][0m
  
[92m[OKAY][0m......
 [92m[OKAY][0m
ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


sparse_attnsparse_attn  ........................sparse_attn  [93m[NO][0m [93m[NO][0msparse_attn............    ..........................[93m[NO][0m    [92m[OKAY][0m[93m[NO][0m
[92m[OKAY][0m....... 
 .......transformer [92m[OKAY][0m [92m[OKAY][0m
----------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------
op nameop name
transformer............
  ............[92m[YES][0mtransformer   transformer[92m[YES][0m..................    ..................[92m[OKAY][0m[92m[YES][0m 
op name   ................op name................................   ................ installedinstalledinstalled    installed....  .. ..compatiblecompatible  compatible

compatible
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
  [92m[YES][0m......[92m[OKAY][0m  
......[92m[OKAY][0m 
stochastic_transformer[92m[OKAY][0m 
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------

.stochastic_transformer  [92m[YES][0m. ...... stochastic_transformer stochastic_transformer[92m[YES][0m [92m[OKAY][0m  
........   [92m[YES][0m[92m[YES][0m[92m[OKAY][0m  
............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


op nameop nameop nameop name    ................................................................   installed installed installed installed ......    ..compatiblecompatiblecompatible 


compatible----------------------------------------------------------------------------------------------------

--------------------------------------------------

cpu_adam cpu_adamcpu_adamcpu_adam...............   ............... .............................. [92m[YES][0m  [92m[YES][0m [92m[YES][0m[92m[YES][0m ......  ............ ......[92m[OKAY][0m   
[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------
cpu_adamcpu_adam  cpu_adam..............................  cpu_adam ...............[92m[YES][0m [92m[YES][0m  ............... ...... [92m[YES][0m......  [92m[OKAY][0m[92m[YES][0m
  ............[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m

fused_adam .............fused_adamfused_adam   fused_adam[92m[YES][0m..........................    ......[92m[YES][0m[92m[YES][0m.............    ............[92m[OKAY][0m[92m[YES][0m 
fused_adam ............. [92m[YES][0m fused_adam...... fused_adamfused_adam .............  [92m[OKAY][0m .............
  [92m[OKAY][0m[92m[OKAY][0m
......
 fused_lamb[92m[OKAY][0m 
[92m[YES][0m.............   [92m[YES][0m......[92m[YES][0m   fused_lamb......[92m[OKAY][0m  ......
.............fused_lambfused_lamb   [92m[YES][0m.............  .............fused_lamb......[92m[YES][0m    [92m[OKAY][0m[92m[YES][0m...................
   ......[92m[YES][0m[92m[OKAY][0m  
[92m[OKAY][0m......
 [92m[OKAY][0m
.............[92m[OKAY][0m  
[92m[OKAY][0m[92m[YES][0m
 fused_lamb...... fused_lambfused_lamb .............  [92m[OKAY][0m .............[92m[YES][0m
sparse_attn ............ [93m[NO][0m .......sparse_attn  [92m[OKAY][0msparse_attn............
.............   ......[92m[YES][0m[92m[YES][0m   ......[92m[OKAY][0m...... 
[92m[OKAY][0m [92m[OKAY][0m

  ............sparse_attn[93m[NO][0mtransformer    [93m[NO][0m...............................    [92m[OKAY][0m.......[93m[NO][0m[92m[YES][0m 
  [92m[OKAY][0m.............
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
  transformer[92m[OKAY][0m[92m[OKAY][0m 

transformersparse_attn sparse_attn ........................ sparse_attn  ............ [92m[YES][0m[93m[NO][0m............   [93m[NO][0m [93m[NO][0m...... .......   .......[92m[OKAY][0m.......[92m[OKAY][0m 
 
transformer............  transformer............[92m[YES][0mstochastic_transformer    ..................[92m[YES][0m.    [92m[OKAY][0m[92m[YES][0m[92m[YES][0m
[92m[OKAY][0m[92m[OKAY][0m

......   ............[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0mstochastic_transformer

transformerstochastic_transformer transformertransformer ............ .   ............[92m[YES][0m............[92m[YES][0m    [92m[YES][0m...... [92m[YES][0m...... ...... [92m[OKAY][0m ...... [92m[OKAY][0m
 . stochastic_transformer[92m[YES][0m  stochastic_transformer.......  . [92m[OKAY][0m[92m[YES][0m 
 [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
 [92m[OKAY][0m
[92m[OKAY][0m

stochastic_transformer .stochastic_transformer stochastic_transformer[92m[YES][0m   .. ......  [92m[YES][0m[92m[YES][0m[92m[OKAY][0m 
 ............  [92m[OKAY][0m[92m[OKAY][0m

ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m[92m[OKAY][0m
 [92m[OKAY][0m
[92m[OKAY][0m--------------------------------------------------


--------------------------------------------------
----------------------------------------------------------------------------------------------------op name
 
op nameop name................ op name  ................ installed  ..................................installed    compatibleinstalledinstalled..
   ..--------------------------------------------------compatible..

  --------------------------------------------------compatiblecompatible


----------------------------------------------------------------------------------------------------

cpu_adam ............... [92m[YES][0mcpu_adam  cpu_adam.....................cpu_adam    [92m[OKAY][0m..............................[92m[YES][0m
   [92m[YES][0m......[92m[YES][0m   ......[92m[OKAY][0m...... 
 [92m[OKAY][0m[92m[OKAY][0m

fused_adam ............. [92m[YES][0m ...... [92m[OKAY][0m
fused_adamfused_adam  fused_adamfused_lamb..........................    [92m[YES][0m.............[92m[YES][0m.............    [92m[YES][0m............  [92m[YES][0m ......[92m[OKAY][0m [92m[OKAY][0m 
[92m[OKAY][0m
......
 [92m[OKAY][0m
fused_lamb fused_lamb............. fused_lamb ............. [92m[YES][0m .............[92m[YES][0m  ...... ......sparse_attn  [92m[YES][0m [92m[OKAY][0m[92m[OKAY][0m............ 

 ......[93m[NO][0m  .......[92m[OKAY][0m 
[92m[OKAY][0m
transformer ............ [92m[YES][0msparse_attnsparse_attn  ......  ........................ [92m[OKAY][0m sparse_attn[93m[NO][0m
[93m[NO][0m   ...................  stochastic_transformer.......[93m[NO][0m[92m[OKAY][0m  
 .[92m[OKAY][0m .......transformer[92m[YES][0m
   ..................[92m[OKAY][0m transformer 
[92m[YES][0m[92m[OKAY][0m  ......transformer
 ............ [92m[OKAY][0m ............
[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m stochastic_transformer
[92m[OKAY][0m 
ninjaninja  .................................... [92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
. [92m[YES][0m stochastic_transformer......stochastic_transformer   [92m[OKAY][0m.
.  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------op name
 ................op name installed  ..................  compatibleinstalled
 --------------------------------------------------..
 compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m cpu_adam......  ...............[92m[OKAY][0m
 [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [92m[YES][0m fused_adam......  .............[92m[OKAY][0m 
[92m[YES][0m ...... fused_lamb[92m[OKAY][0m 
............. [92m[YES][0m fused_lamb......  [92m[OKAY][0m.............
 [92m[YES][0m ...... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0msparse_attn .......  ............[92m[OKAY][0m 
[93m[NO][0m .......transformer  ............[92m[OKAY][0m 
[92m[YES][0m ...... transformer[92m[OKAY][0m
 ............ [92m[YES][0m ......stochastic_transformer  [92m[OKAY][0m.
 [92m[YES][0m ...... [92m[OKAY][0mstochastic_transformer
 . [92m[YES][0m ...... [92m[OKAY][0m
----------------------------------------------------------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report--------------------------------------------------

----------------------------------------------------------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report
--------------------------------------------------


JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


----------------------------------------------------------------------------------------------------
--------------------------------------------------
--------------------------------------------------
op name
op name op name ................op name ................  ................ installed................installed   installed ..installed ..  .. compatible.. compatible 
compatible
compatible--------------------------------------------------


------------------------------------------------------------------------------------------------------------------------------------------------------


cpu_adam ............... cpu_adam[92m[YES][0m  cpu_adam...............cpu_adam ......  [92m[YES][0m .............................. [92m[OKAY][0m  ......[92m[YES][0m
[92m[YES][0m   [92m[OKAY][0m............
  [92m[OKAY][0m[92m[OKAY][0m

fused_adam ............. [92m[YES][0m fused_adam......  .............[92m[OKAY][0m 
fused_adam[92m[YES][0m  ...................fused_adam fused_lamb  [92m[YES][0m [92m[OKAY][0m............. 
 ...................[92m[YES][0m   [92m[OKAY][0m[92m[YES][0m......fused_lamb
   [92m[OKAY][0m...................
 fused_lamb [92m[OKAY][0m [92m[YES][0m
............. fused_lamb ...... [92m[YES][0m ............. [92m[OKAY][0m ......
[92m[YES][0m  [92m[OKAY][0m......sparse_attn 
 [92m[OKAY][0m
............ [93m[NO][0m ....... sparse_attn[92m[OKAY][0m 
............ [93m[NO][0mtransformer  ...................sparse_attn   [92m[OKAY][0m[92m[YES][0m............ 
sparse_attn ...... transformer............[93m[NO][0m   ............ [92m[OKAY][0m....... [93m[NO][0m
 [92m[YES][0m [92m[OKAY][0m....... 
 ......stochastic_transformer[92m[OKAY][0m transformer [92m[OKAY][0m 
.
............  [92m[YES][0mtransformer[92m[YES][0m   stochastic_transformer........................    [92m[YES][0m[92m[OKAY][0m[92m[OKAY][0m. 
 
......[92m[YES][0m  stochastic_transformer......[92m[OKAY][0m 
 .[92m[OKAY][0m 
[92m[YES][0m ...... stochastic_transformer[92m[OKAY][0m 
. [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


--------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja

JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


--------------------------------------------------DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja


----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja   ......................................................   [92m[OKAY][0m ..................[92m[OKAY][0m
[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
op name
op name op nameop name ................  ................ ................................  installed  installedinstalled.. installedcompatible 
  ...... --------------------------------------------------  
compatiblecompatiblecompatible


----------------------------------------------------------------------------------------------------
--------------------------------------------------
cpu_adam
 ............... [92m[YES][0m ...... cpu_adam[92m[OKAY][0m 
...............cpu_adamcpu_adam  ...............[92m[YES][0m   [92m[YES][0m..................... fused_adam  ...... [92m[YES][0m [92m[OKAY][0m [92m[OKAY][0m.............
......
  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
fused_adam ............. fused_adamfused_lamb[92m[YES][0m fused_adam .............  ...... .......................... [92m[YES][0m [92m[OKAY][0m  [92m[YES][0m
[92m[YES][0m......   ............ fused_lamb [92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0m.............
 [92m[YES][0m ......fused_lamb  [92m[OKAY][0mfused_lamb .............
.............  [92m[YES][0msparse_attn [92m[YES][0m ...... ............ ...... [92m[OKAY][0m [93m[NO][0m
[92m[OKAY][0m 
.......sparse_attn  [92m[OKAY][0m............
 [93m[NO][0m .......transformer  [92m[OKAY][0m............
 [92m[YES][0m sparse_attn......transformer sparse_attn   ............[92m[OKAY][0m........................  
 [93m[NO][0m[93m[NO][0m[92m[YES][0m   ....................stochastic_transformer    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
.

 [92m[YES][0m ...... stochastic_transformertransformer[92m[OKAY][0mtransformer  
 .........................  [92m[YES][0m [92m[YES][0m[92m[YES][0m   ..................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


stochastic_transformer stochastic_transformer.  [92m[YES][0m.  ......[92m[YES][0m [92m[OKAY][0m 
...... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------DeepSpeed C++/CUDA extension op report

JIT compiled ops requires ninja--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninja
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja  .................. ..................  .................. ..................[92m[OKAY][0m [92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------[92m[OKAY][0m


----------------------------------------------------------------------------------------------------op name 
--------------------------------------------------
................
op name op name installedop name ................  ................ .................. installed   ..installedinstalledcompatible
   compatible--------------------------------------------------..
.. 
-------------------------------------------------- compatible
compatible

----------------------------------------------------------------------------------------------------

cpu_adam ............... [92m[YES][0m cpu_adam......  ...............[92m[OKAY][0m cpu_adamcpu_adam[92m[YES][0m
   ....................................   [92m[YES][0m[92m[OKAY][0m[92m[YES][0m 
 ............ [92m[OKAY][0mfused_adam 
 [92m[OKAY][0m.............
fused_adam  [92m[YES][0m.............  ......[92m[YES][0m  [92m[OKAY][0m......fused_adam
 fused_adam[92m[OKAY][0m fused_lamb 
 .......................................   [92m[YES][0mfused_lamb[92m[YES][0m[92m[YES][0m    ......................... ...... [92m[YES][0m  [92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m

......
 [92m[OKAY][0mfused_lamb
 fused_lamb.............  .............[92m[YES][0m  [92m[YES][0m......  sparse_attn......[92m[OKAY][0m  
............[92m[OKAY][0m sparse_attn
[93m[NO][0m  ...................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
transformer sparse_attntransformer............   sparse_attn[92m[YES][0m............ ............  ............ ......[93m[NO][0m [92m[YES][0m  .......[93m[NO][0m [92m[OKAY][0m 
 ......[92m[OKAY][0m....... 
[92m[OKAY][0m stochastic_transformer
transformer [92m[OKAY][0m. stochastic_transformer............
   [92m[YES][0m.[92m[YES][0m transformer[92m[YES][0m    .................. ............[92m[OKAY][0m  
 [92m[YES][0m[92m[OKAY][0m[92m[OKAY][0m 

...... [92m[OKAY][0m
stochastic_transformer .stochastic_transformer  [92m[YES][0m.  ...... [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
ninjaninjaninjaninja   ....................................  .................. ..................[92m[OKAY][0m[92m[OKAY][0m

  --------------------------------------------------[92m[OKAY][0m[92m[OKAY][0m--------------------------------------------------


op name--------------------------------------------------op name --------------------------------------------------
 ................
................ op name installedop nameinstalled    .................... ................compatible   
installedcompatibleinstalled--------------------------------------------------
 
 --------------------------------------------------....
  compatiblecompatible

----------------------------------------------------------------------------------------------------

cpu_adam ...............cpu_adam  [92m[YES][0m...............  ......[92m[YES][0m  cpu_adamcpu_adam......[92m[OKAY][0m   
...............[92m[OKAY][0m............... 
 [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m
fused_adam
 fused_adam.............  .............[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0mfused_adam
fused_adam fused_lamb ............. ............. .............fused_lamb [92m[YES][0m [92m[YES][0m  [92m[YES][0m ................... ......   ......[92m[YES][0m[92m[OKAY][0m[92m[OKAY][0m  

[92m[OKAY][0m......
 [92m[OKAY][0mfused_lamb
fused_lamb  ..........................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

sparse_attn ............sparse_attn  [93m[NO][0m............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
transformer sparse_attnsparse_attn............transformer    ............[92m[YES][0m........................   [93m[NO][0m [92m[YES][0m[93m[NO][0m ......  ....... .............   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


transformertransformerstochastic_transformerstochastic_transformer    ..........................   [92m[YES][0m[92m[YES][0m [92m[YES][0m   [92m[YES][0m..................    ......[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m

 
[92m[OKAY][0m
stochastic_transformer stochastic_transformer.  .[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------
JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja


DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


op nameop name op name ................op name ................  ................installed ................   installedinstalledinstalled..    ....compatible ..
  compatible--------------------------------------------------compatiblecompatible


------------------------------------------------------------------------------------------------------------------------------------------------------


cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0mcpu_adam
cpu_adamcpu_adam ...............   ..............................[92m[YES][0m   [92m[YES][0mfused_adam...... [92m[YES][0m  ...................[92m[OKAY][0m   
[92m[OKAY][0m......[92m[YES][0m
  [92m[OKAY][0m...... 
[92m[OKAY][0m
fused_adam fused_adam.............fused_lamb  ............. ............. fused_adam [92m[YES][0m[92m[YES][0m[92m[YES][0m  .............   ............[92m[YES][0m ......   [92m[OKAY][0m......[92m[OKAY][0m[92m[OKAY][0m


 [92m[OKAY][0m
fused_lamb fused_lamb.............fused_lamb   .............[92m[YES][0m ............. sparse_attn[92m[YES][0m  ...... [92m[YES][0m..................   [92m[OKAY][0m [93m[NO][0m......
[92m[OKAY][0m  
.......[92m[OKAY][0m 
[92m[OKAY][0m
transformer ............ [92m[YES][0msparse_attn ......  ............sparse_attn[92m[OKAY][0m 
 ............sparse_attn[93m[NO][0m   stochastic_transformer[93m[NO][0m............ .......  .  [93m[NO][0m.......[92m[YES][0m[92m[OKAY][0m   [92m[OKAY][0m......
 .......
[92m[OKAY][0m transformer
[92m[OKAY][0mtransformer  
........................  transformer[92m[YES][0m[92m[YES][0m   ........................   [92m[OKAY][0m[92m[YES][0m[92m[OKAY][0m
 
...... [92m[OKAY][0m
stochastic_transformerstochastic_transformer  ..stochastic_transformer   [92m[YES][0m[92m[YES][0m.   ............[92m[YES][0m  [92m[OKAY][0m [92m[OKAY][0m......

 [92m[OKAY][0m
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------


op nameop nameop name  op name................  ................ ................ ................installed   installed..installed installed..   compatible ..
.. compatible--------------------------------------------------compatible 


compatible----------------------------------------------------------------------------------------------------


--------------------------------------------------
cpu_adam cpu_adamcpu_adam...............cpu_adam   ............... .............................. [92m[YES][0m   [92m[YES][0m......[92m[YES][0m[92m[YES][0m   ...... ...... [92m[OKAY][0m ......[92m[OKAY][0m[92m[OKAY][0m
 

[92m[OKAY][0m
fused_adamfused_adamfused_adamfused_adam    ....................................................    [92m[YES][0m[92m[YES][0m[92m[YES][0m[92m[YES][0m  ......  ..................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


fused_lamb fused_lambfused_lambfused_lamb.............   ............. ............. [92m[YES][0m.............   [92m[YES][0m......[92m[YES][0m [92m[YES][0m   ...... [92m[OKAY][0m......
......[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m

sparse_attn sparse_attn............sparse_attnsparse_attn    [93m[NO][0m............ ...................  [93m[NO][0m............ [93m[NO][0m  [92m[OKAY][0m .......
[93m[NO][0m .......[92m[OKAY][0m transformer
 ....... [92m[OKAY][0m ............
[92m[OKAY][0mtransformer 
[92m[YES][0mtransformer   ..............................transformer   [92m[OKAY][0m [92m[YES][0m
[92m[YES][0m ............ ...... ...... [92m[YES][0mstochastic_transformer[92m[OKAY][0m   [92m[OKAY][0m
.
......  [92m[YES][0mstochastic_transformer[92m[OKAY][0m  stochastic_transformer......
  .[92m[OKAY][0m.  stochastic_transformer
[92m[YES][0m[92m[YES][0m   .............   [92m[OKAY][0m[92m[YES][0m[92m[OKAY][0m 

...... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

DeepSpeed C++/CUDA extension op report--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------

DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja
--------------------------------------------------
JIT compiled ops requires ninja--------------------------------------------------

DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

DeepSpeed C++/CUDA extension op report--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninja   ......................................................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------


op nameninja op nameop name  ................ ..................................  ................ installedinstalled[92m[OKAY][0m   installed..
..   ..compatible--------------------------------------------------compatible 


compatible----------------------------------------------------------------------------------------------------op name


 --------------------------------------------------................
 installed .. compatible
cpu_adam--------------------------------------------------cpu_adam 
 cpu_adam..............................   [92m[YES][0m...............[92m[YES][0m   ............ [92m[YES][0m [92m[OKAY][0m cpu_adam [92m[OKAY][0m
.....................
  fused_adam[92m[OKAY][0m[92m[YES][0m  
...................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
fused_adam fused_lamb............. fused_adam fused_adam............. [92m[YES][0m   [92m[YES][0m...................   ...................[92m[OKAY][0m[92m[YES][0m  
[92m[YES][0m [92m[OKAY][0m ......
......  [92m[OKAY][0mfused_lamb
 ............. [92m[YES][0m [92m[OKAY][0m......
fused_lamb  [92m[OKAY][0m.............
 sparse_attn[92m[YES][0m  ..................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
transformer sparse_attnfused_lamb............  ............  [92m[YES][0m[93m[NO][0m  .............  [92m[OKAY][0m[92m[OKAY][0m.............
sparse_attn
  ............stochastic_transformer transformer [93m[NO][0m . ............ ....... [92m[YES][0m [92m[YES][0m [92m[YES][0m [92m[OKAY][0m ......
......  [92m[OKAY][0m[92m[OKAY][0m
transformer
 ..................  [92m[OKAY][0mstochastic_transformer[92m[YES][0m  .......  [92m[YES][0m[92m[OKAY][0m 

...... [92m[OKAY][0m
stochastic_transformer . [92m[YES][0m ...... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [92m[YES][0m ...... [92m[OKAY][0m
stochastic_transformer . [92m[YES][0m ...... [92m[OKAY][0m
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


----------------------------------------------------------------------------------------------------
--------------------------------------------------
--------------------------------------------------
op name
op name op name ................ ................ op name................ installed installed   ................installed .... installed  .. compatiblecompatible ..

compatible --------------------------------------------------
compatible--------------------------------------------------
--------------------------------------------------


--------------------------------------------------
cpu_adamcpu_adam cpu_adam ............... ............... ...............[92m[YES][0m   cpu_adam[92m[YES][0m[92m[YES][0m......   ......[92m[OKAY][0m  
.....................[92m[OKAY][0m  
[92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
fused_adam ............. [92m[YES][0m fused_adamfused_adam......   ..........................[92m[OKAY][0m 
 [92m[YES][0mfused_adam[92m[YES][0m   ......fused_lamb...................    [92m[OKAY][0m.............[92m[OKAY][0m[92m[YES][0m
 
 [92m[YES][0m...... fused_lamb......fused_lamb   ............. [92m[OKAY][0m.............[92m[OKAY][0m  

[92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

fused_lamb ............. [92m[YES][0m sparse_attn......  ............ sparse_attn[92m[OKAY][0m[93m[NO][0m
sparse_attn   ...............................   [93m[NO][0m[92m[OKAY][0m [93m[NO][0m
.......  .......[92m[OKAY][0m transformer
[92m[OKAY][0m 
............transformer  [92m[YES][0mtransformersparse_attn............   .................. [92m[YES][0m  ............ [92m[OKAY][0m[92m[YES][0m 
...... [93m[NO][0m ...... [92m[OKAY][0m .......[92m[OKAY][0m

 stochastic_transformer [92m[OKAY][0m.stochastic_transformer
stochastic_transformer   [92m[YES][0m ........transformer   [92m[YES][0m [92m[OKAY][0m[92m[YES][0m 
............ ...... ...... [92m[OKAY][0m [92m[YES][0m[92m[OKAY][0m

 ...... [92m[OKAY][0m
stochastic_transformer . [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [92m[YES][0m ...... [92m[OKAY][0m
fused_lamb ............. [92m[YES][0m ...... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [92m[YES][0m ...... [92m[OKAY][0m
stochastic_transformer . [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [92m[YES][0m ...... [92m[OKAY][0m
fused_lamb ............. [92m[YES][0m ...... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [92m[YES][0m ...... [92m[OKAY][0m
stochastic_transformer . [92m[YES][0m ...... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.

[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.

[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.

[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
--------------------------------------------------
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.

utils .................. [92m[YES][0m ...... [92m[OKAY][0m
async_io ............... [93m[NO][0mquantizer  .....................  [93m[NO][0m[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0masync_io .......  ...............[93m[NO][0m 
[93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m transformer_inference.......  [92m[OKAY][0m..
 [93m[NO][0m ....... [92m[OKAY][0mutils
 .................. [92m[YES][0m ...... utils[92m[OKAY][0m 
.................. [92m[YES][0m ......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. --------------------------------------------------[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ....... .......[93m[NO][0m 
[93m[NO][0m
transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  .......[92m[OKAY][0m 
[92m[OKAY][0m
utils .................. utils[92m[YES][0m  ........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... async_io[93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m
 ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m .......transformer_inference  [92m[OKAY][0m..
 [93m[NO][0m ....... [92m[OKAY][0mutils
 .................. [92m[YES][0m ......utils  [92m[OKAY][0m..................
 [92m[YES][0m ...... [92m[OKAY][0mquantizer
 .............. [93m[NO][0m .......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m .......-------------------------------------------------- 
[92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io: please install the libaio-devel package with yumutils ..................
 [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum[93m [WARNING] [0m async_io: please install the libaio-devel package with yum

[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
DeepSpeed general environment info:
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
torch version .................... 1.8.2
torch cuda version ............... 11.1
--------------------------------------------------
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ...............async_io [93m[NO][0m  ......................  async_io[93m[NO][0m[93m[NO][0m  
......................  [93m[NO][0m[93m[NO][0m
 ....... [93m[NO][0m
transformer_inference .. transformer_inference[93m[NO][0m  .........transformer_inference  [93m[NO][0m[92m[OKAY][0m  
.........  [93m[NO][0m[92m[OKAY][0m 
.......utils  [92m[OKAY][0m..................
 [92m[YES][0m utils......  ..................[92m[OKAY][0m utils
[92m[YES][0m  ........................  [92m[YES][0mquantizer[92m[OKAY][0m  
....................  [92m[OKAY][0m[93m[NO][0m
 ....... quantizer[92m[OKAY][0m 
..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m--------------------------------------------------  
[92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
torch versionDeepSpeed general environment info: .................... 
1.8.2
torch cuda version ...............torch install path  11.1...............
 nvcc version ..................... 11.2
deepspeed install path ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']...........
 ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']torch version
 ....................deepspeed info  1.8.2...................
 0.5.5+58a8e13, 58a8e13, mastertorch cuda version
 deepspeed wheel compiled w................  ......11.1 
torch 1.8, cuda 11.1nvcc version
 ..................... 11.2
deepspeed install path DeepSpeed general environment info:........... 
['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... torch install path0.5.5+58a8e13, 58a8e13, master 
...............deepspeed wheel compiled w.  ...... torch 1.8, cuda 11.1
['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:
['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch install path torch version...............  .................... 1.8.2
torch cuda version ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']...............
 11.1
torch versionnvcc version  .........................................  1.8.211.2

deepspeed install path torch cuda version...........  ............... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']11.1

deepspeed infonvcc version  ........................................  0.5.5+58a8e13, 58a8e13, master11.2

deepspeed wheel compiled w.deepspeed install path  .................  torch 1.8, cuda 11.1
['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch version .................... 1.8.2
torch cuda version ............... 11.1
DeepSpeed general environment info:
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
DeepSpeed general environment info:
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path DeepSpeed general environment info:............... 
torch install path['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
 ............... torch version .................... 1.8.2
torch cuda version['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] 
............... 11.1
torch versionnvcc version  .........................................  1.8.211.2

deepspeed install pathtorch cuda version  ..........................  11.1['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']

nvcc versiondeepspeed info  ........................................  11.20.5.5+58a8e13, 58a8e13, master

deepspeed install pathdeepspeed wheel compiled w.  .................  torch 1.8, cuda 11.1['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']

deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ******** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****

**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ******** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****

**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ******** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****

**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
using world size: 64, data-parallel-size: 4, tensor-model-parallel size: 4, pipeline-model-parallel size: 4 
using torch.float16 for parameters ...
------------------------ arguments ------------------------
  accumulate_allreduce_grads_in_fp32 .............. False
  adam_beta1 ...................................... 0.9
  adam_beta2 ...................................... 0.999
  adam_eps ........................................ 1e-08
  adlr_autoresume ................................. False
  adlr_autoresume_interval ........................ 1000
  apply_query_key_layer_scaling ................... True
  apply_residual_connection_post_layernorm ........ False
  attention_dropout ............................... 0.1
  attention_softmax_in_fp32 ....................... False
  bert_binary_head ................................ True
  bert_load ....................................... None
  bf16 ............................................ False
  bias_dropout_fusion ............................. True
  bias_gelu_fusion ................................ True
  biencoder_projection_dim ........................ 0
  biencoder_shared_query_context_model ............ False
  block_data_path ................................. None
  checkpoint_activations .......................... True
  checkpoint_in_cpu ............................... False
  checkpoint_num_layers ........................... 1
  clip_grad ....................................... 1.0
  codecarbon_dir .................................. None
  consumed_train_samples .......................... 0
  consumed_train_tokens ........................... 0
  consumed_valid_samples .......................... 0
  contigious_checkpointing ........................ False
  cpu_optimizer ................................... False
  cpu_torch_adam .................................. False
  curriculum_learning ............................. False
  data_impl ....................................... mmap
  data_parallel_size .............................. 4
  data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document']
  dataloader_type ................................. single
  DDP_impl ........................................ local
  decoder_seq_length .............................. None
  deepscale ....................................... False
  deepscale_config ................................ None
  deepspeed ....................................... True
  deepspeed_activation_checkpointing .............. True
  deepspeed_config ................................ ./ds_config.1809761.json
  deepspeed_mpi ................................... False
  distribute_checkpointed_activations ............. False
  distributed_backend ............................. nccl
  embedding_path .................................. None
  encoder_seq_length .............................. 2048
  eod_mask_loss ................................... False
  eval_interval ................................... 1000
  eval_iters ...................................... 100
  evidence_data_path .............................. None
  exit_duration_in_mins ........................... 1190
  exit_interval ................................... None
  ffn_hidden_size ................................. 8192
  finetune ........................................ False
  fp16 ............................................ True
  fp16_lm_cross_entropy ........................... False
  fp32_residual_connection ........................ False
  gigaflos_no_embeds .............................. 0
  global_batch_size ............................... 512
  glu_activation .................................. None
  hidden_dropout .................................. 0.1
  hidden_size ..................................... 2048
  hysteresis ...................................... 2
  ict_head_size ................................... None
  ict_load ........................................ None
  img_dim ......................................... 224
  indexer_batch_size .............................. 128
  indexer_log_interval ............................ 1000
  init_method_std ................................. 0.02
  init_method_xavier_uniform ...................... False
  initial_loss_scale .............................. 4294967296
  kv_channels ..................................... 128
  layernorm_epsilon ............................... 1e-05
  lazy_mpu_init ................................... None
  load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
  local_rank ...................................... 0
  log_batch_size_to_tensorboard ................... True
  log_interval .................................... 200
  log_learning_rate_to_tensorboard ................ True
  log_level ....................................... None
  log_level_replica ............................... None
  log_loss_scale_to_tensorboard ................... True
  log_num_zeros_in_grad ........................... False
  log_params_norm ................................. False
  log_timers_to_tensorboard ....................... True
  log_validation_ppl_to_tensorboard ............... True
  loss_on_targets_only ............................ True
  loss_scale ...................................... 12.0
  loss_scale_window ............................... 1000
  lr .............................................. 0.0002
  lr_decay_iters .................................. None
  lr_decay_samples ................................ 73242187
  lr_decay_style .................................. cosine
  lr_decay_tokens ................................. None
  lr_warmup_fraction .............................. None
  lr_warmup_iters ................................. 0
  lr_warmup_samples ............................... 183105
  make_vocab_size_divisible_by .................... 128
  mask_prob ....................................... 0.15
  masked_softmax_fusion ........................... True
  max_position_embeddings ......................... 2048
  memory_centric_tiled_linear ..................... False
  merge_file ...................................... /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/data/gpt2-merges.txt
  micro_batch_size ................................ 8
  min_loss_scale .................................. 1.0
  min_lr .......................................... 1e-05
  mmap_warmup ..................................... False
  no_load_optim ................................... None
  no_load_rng ..................................... None
  no_save_optim ................................... None
  no_save_rng ..................................... None
  num_attention_heads ............................. 16
  num_channels .................................... 3
  num_classes ..................................... 1000
  num_layers ...................................... 24
  num_layers_per_virtual_pipeline_stage ........... None
  num_workers ..................................... 2
  onnx_safe ....................................... None
  openai_gelu ..................................... False
  optimizer ....................................... adam
  override_lr_scheduler ........................... False
  params_dtype .................................... torch.float16
  partition_activations ........................... False
  patch_dim ....................................... 16
  pipeline_model_parallel_size .................... 4
  position_embedding_type ......................... PositionEmbeddingType.absolute
  profile_backward ................................ False
  query_in_block_prob ............................. 0.1
  rampup_batch_size ............................... ['32', '32', '2_000_000']
  rank ............................................ 0
  remote_device ................................... none
  reset_attention_mask ............................ False
  reset_position_ids .............................. False
  retriever_report_topk_accuracies ................ []
  retriever_score_scaling ......................... False
  retriever_seq_length ............................ 256
  sample_rate ..................................... 1.0
  save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
  save_interval ................................... 1500
  scatter_gather_tensors_in_pipeline .............. True
  scattered_embeddings ............................ False
  seed ............................................ 1234
  seq_length ...................................... 2048
  sgd_momentum .................................... 0.9
  short_seq_prob .................................. 0.1
  split ........................................... 949,50,1
  split_transformers .............................. False
  synchronize_each_layer .......................... False
  tensor_model_parallel_size ...................... 4
  tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/tr6-1B3-prefix-lm-unbiased-loss-logs/tensorboard
  tensorboard_log_interval ........................ 1
  tensorboard_queue_size .......................... 5
  tile_factor ..................................... 1
  titles_data_path ................................ None
  tokenizer_name_or_path .......................... None
  tokenizer_type .................................. GPT2BPETokenizer
  train_iters ..................................... None
  train_samples ................................... 73242187
  train_tokens .................................... None
  use_bnb_optimizer ............................... False
  use_checkpoint_lr_scheduler ..................... False
  use_contiguous_buffers_in_ddp ................... False
  use_cpu_initialization .......................... None
  use_one_sent_docs ............................... False
  use_pin_memory .................................. False
  virtual_pipeline_model_parallel_size ............ None
  vocab_extra_ids ................................. 0
  vocab_file ...................................... /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/data/gpt2-vocab.json
  weight_decay .................................... 0.1
  world_size ...................................... 64
  zero_allgather_bucket_size ...................... 0.0
  zero_contigious_gradients ....................... False
  zero_reduce_bucket_size ......................... 0.0
  zero_reduce_scatter ............................. False
  zero_stage ...................................... 1
-------------------- end of arguments ---------------------
will use batch size rampup starting from global batch size 32 to global batch size 512 with batch size increments 32 over 2000000 samples.
> building GPT2BPETokenizer tokenizer ...
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
 > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688)
> initializing torch distributed ...
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
> setting tensorboard ...
> initializing tensor model parallel with size 4
> initializing pipeline model parallel with size 4
> setting random seeds to 1234 ...
[2021-11-03 09:44:05,097] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
> compiling dataset index builder ...
make: Entering directory '/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/data'
make: Nothing to be done for 'default'.
make: Leaving directory '/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/data'
>>> done with dataset index builder. Compilation time: 0.341 seconds
> compiling and loading fused kernels ...
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_upper_triang_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_upper_triang_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module fused_mix_prec_layer_norm_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_mix_prec_layer_norm_cuda...
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
>>> done with compiling and loading fused kernels. Compilation time: 25.497 seconds
time to initialize megatron (seconds): -16.999
[after megatron is initialized] datetime: 2021-11-03 09:44:31 
building GPT model ...
[2021-11-03 09:44:31,043] [INFO] [utils.py:806:see_memory_usage] Before Building Model
[2021-11-03 09:44:31,044] [INFO] [utils.py:807:see_memory_usage] MA 0.0 GB         Max_MA 0.0 GB         CA 0.0 GB         Max_CA 0 GB 
[2021-11-03 09:44:31,045] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory:  used = 41.24 GB, percent = 22.0%
SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None
Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=1, data=0, model=0): 16, ProcessCoord(pipe=1, data=0, model=1): 17, ProcessCoord(pipe=1, data=0, model=2): 18, ProcessCoord(pipe=1, data=0, model=3): 19, ProcessCoord(pipe=1, data=1, model=0): 20, ProcessCoord(pipe=1, data=1, model=1): 21, ProcessCoord(pipe=1, data=1, model=2): 22, ProcessCoord(pipe=1, data=1, model=3): 23, ProcessCoord(pipe=1, data=2, model=0): 24, ProcessCoord(pipe=1, data=2, model=1): 25, ProcessCoord(pipe=1, data=2, model=2): 26, ProcessCoord(pipe=1, data=2, model=3): 27, ProcessCoord(pipe=1, data=3, model=0): 28, ProcessCoord(pipe=1, data=3, model=1): 29, ProcessCoord(pipe=1, data=3, model=2): 30, ProcessCoord(pipe=1, data=3, model=3): 31, ProcessCoord(pipe=2, data=0, model=0): 32, ProcessCoord(pipe=2, data=0, model=1): 33, ProcessCoord(pipe=2, data=0, model=2): 34, ProcessCoord(pipe=2, data=0, model=3): 35, ProcessCoord(pipe=2, data=1, model=0): 36, ProcessCoord(pipe=2, data=1, model=1): 37, ProcessCoord(pipe=2, data=1, model=2): 38, ProcessCoord(pipe=2, data=1, model=3): 39, ProcessCoord(pipe=2, data=2, model=0): 40, ProcessCoord(pipe=2, data=2, model=1): 41, ProcessCoord(pipe=2, data=2, model=2): 42, ProcessCoord(pipe=2, data=2, model=3): 43, ProcessCoord(pipe=2, data=3, model=0): 44, ProcessCoord(pipe=2, data=3, model=1): 45, ProcessCoord(pipe=2, data=3, model=2): 46, ProcessCoord(pipe=2, data=3, model=3): 47, ProcessCoord(pipe=3, data=0, model=0): 48, ProcessCoord(pipe=3, data=0, model=1): 49, ProcessCoord(pipe=3, data=0, model=2): 50, ProcessCoord(pipe=3, data=0, model=3): 51, ProcessCoord(pipe=3, data=1, model=0): 52, ProcessCoord(pipe=3, data=1, model=1): 53, ProcessCoord(pipe=3, data=1, model=2): 54, ProcessCoord(pipe=3, data=1, model=3): 55, ProcessCoord(pipe=3, data=2, model=0): 56, ProcessCoord(pipe=3, data=2, model=1): 57, ProcessCoord(pipe=3, data=2, model=2): 58, ProcessCoord(pipe=3, data=2, model=3): 59, ProcessCoord(pipe=3, data=3, model=0): 60, ProcessCoord(pipe=3, data=3, model=1): 61, ProcessCoord(pipe=3, data=3, model=2): 62, ProcessCoord(pipe=3, data=3, model=3): 63}
[2021-11-03 09:44:31,567] [INFO] [module.py:365:_partition_layers] Partitioning pipeline stages with method type:transformer
stage=0 layers=9
     0: _to_float16
     1: EmbeddingPipe
     2: <lambda>
     3: ParallelTransformerLayerPipe
     4: ParallelTransformerLayerPipe
     5: ParallelTransformerLayerPipe
     6: ParallelTransformerLayerPipe
     7: ParallelTransformerLayerPipe
     8: ParallelTransformerLayerPipe
stage=1 layers=6
     9: ParallelTransformerLayerPipe
    10: ParallelTransformerLayerPipe
    11: ParallelTransformerLayerPipe
    12: ParallelTransformerLayerPipe
    13: ParallelTransformerLayerPipe
    14: ParallelTransformerLayerPipe
stage=2 layers=6
    15: ParallelTransformerLayerPipe
    16: ParallelTransformerLayerPipe
    17: ParallelTransformerLayerPipe
    18: ParallelTransformerLayerPipe
    19: ParallelTransformerLayerPipe
    20: ParallelTransformerLayerPipe
stage=3 layers=11
    21: ParallelTransformerLayerPipe
    22: ParallelTransformerLayerPipe
    23: ParallelTransformerLayerPipe
    24: ParallelTransformerLayerPipe
    25: ParallelTransformerLayerPipe
    26: ParallelTransformerLayerPipe
    27: <lambda>
    28: <lambda>
    29: MixedFusedLayerNorm
    30: EmbeddingPipe
    31: float16_to_fp32
  loss: CrossEntropy
 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 105743360
 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 105739264
 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 105743360
 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 105739264
 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 105743360
 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 105739264
 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 105743360
[2021-11-03 09:44:31,937] [INFO] [utils.py:806:see_memory_usage] After Building Model
[2021-11-03 09:44:31,938] [INFO] [utils.py:807:see_memory_usage] MA 0.21 GB         Max_MA 0.21 GB         CA 0.22 GB         Max_CA 0 GB 
[2021-11-03 09:44:31,938] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory:  used = 41.48 GB, percent = 22.2%
 > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 105739264
setting training iterations to 152972
> learning rate decay style: cosine
DeepSpeed is enabled.
[2021-11-03 09:44:31,957] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.5.5+58a8e13, git-hash=58a8e13, git-branch=master
[2021-11-03 09:44:32,028] [INFO] [engine.py:207:__init__] DeepSpeed Flops Profiler Enabled: False
[2021-11-03 09:44:32,028] [INFO] [engine.py:862:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer
[2021-11-03 09:44:32,029] [INFO] [engine.py:868:_configure_optimizer] Using client Optimizer as basic optimizer
[2021-11-03 09:44:32,030] [INFO] [engine.py:884:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam
[2021-11-03 09:44:32,030] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type=<class 'apex.optimizers.fused_adam.FusedAdam'>
[2021-11-03 09:44:32,030] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer
[2021-11-03 09:44:32,030] [INFO] [stage2.py:111:__init__] Reduce bucket size 500000000
[2021-11-03 09:44:32,030] [INFO] [stage2.py:112:__init__] Allgather bucket size 500000000
[2021-11-03 09:44:32,030] [INFO] [stage2.py:113:__init__] CPU Offload: False
[2021-11-03 09:44:32,030] [INFO] [stage2.py:114:__init__] Round robin gradient partitioning: False
Rank: 25 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 30 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 18 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 31 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 33 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 17 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 41 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 26 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 42 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 37 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 35 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 29 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 16 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 28 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 38 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 45 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 19 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 32 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 40 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 36 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 27 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 34 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 21 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 20 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 

Rank: 22 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 23 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 46 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 24 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 44 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 39 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 43 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 47 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 11 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 49 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 54 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 3 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 48 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 7 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 59 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 62 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 10 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 15 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 53 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 52 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 2 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 57 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 55 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 6 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 12 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 60 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 56 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 50 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 51 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 0 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 61 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 63 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 14 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 9 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 1 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 5 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 58 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 8 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 13 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 4 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
[2021-11-03 09:44:32,335] [INFO] [utils.py:806:see_memory_usage] Before initializing optimizer states
[2021-11-03 09:44:32,336] [INFO] [utils.py:807:see_memory_usage] MA 0.3 GB         Max_MA 0.35 GB         CA 0.59 GB         Max_CA 1 GB 
[2021-11-03 09:44:32,336] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory:  used = 42.27 GB, percent = 22.6%
[2021-11-03 09:44:32,363] [INFO] [utils.py:806:see_memory_usage] After initializing optimizer states
[2021-11-03 09:44:32,364] [INFO] [utils.py:807:see_memory_usage] MA 0.49 GB         Max_MA 0.59 GB         CA 0.89 GB         Max_CA 1 GB 
[2021-11-03 09:44:32,364] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory:  used = 42.27 GB, percent = 22.6%
[2021-11-03 09:44:32,364] [INFO] [stage2.py:474:__init__] optimizer state initialized
[2021-11-03 09:44:32,396] [INFO] [utils.py:806:see_memory_usage] After initializing ZeRO optimizer
[2021-11-03 09:44:32,397] [INFO] [utils.py:807:see_memory_usage] MA 0.49 GB         Max_MA 0.49 GB         CA 0.89 GB         Max_CA 1 GB 
[2021-11-03 09:44:32,397] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory:  used = 42.27 GB, percent = 22.6%
[2021-11-03 09:44:32,397] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam
[2021-11-03 09:44:32,398] [INFO] [engine.py:599:_configure_lr_scheduler] DeepSpeed using client LR scheduler
[2021-11-03 09:44:32,398] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = <megatron.learning_rates.AnnealingLR object at 0x145e6e87ac10>
[2021-11-03 09:44:32,398] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)]
[2021-11-03 09:44:32,398] [INFO] [config.py:940:print] DeepSpeedEngine configuration:
[2021-11-03 09:44:32,398] [INFO] [config.py:944:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2021-11-03 09:44:32,398] [INFO] [config.py:944:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2021-11-03 09:44:32,398] [INFO] [config.py:944:print]   allreduce_always_fp32 ........ False
[2021-11-03 09:44:32,398] [INFO] [config.py:944:print]   amp_enabled .................. False
[2021-11-03 09:44:32,398] [INFO] [config.py:944:print]   amp_params ................... False
[2021-11-03 09:44:32,398] [INFO] [config.py:944:print]   checkpoint_tag_validation_enabled  True
[2021-11-03 09:44:32,398] [INFO] [config.py:944:print]   checkpoint_tag_validation_fail  False
[2021-11-03 09:44:32,398] [INFO] [config.py:944:print]   curriculum_enabled ........... False
[2021-11-03 09:44:32,398] [INFO] [config.py:944:print]   curriculum_params ............ False
[2021-11-03 09:44:32,398] [INFO] [config.py:944:print]   dataloader_drop_last ......... False
[2021-11-03 09:44:32,398] [INFO] [config.py:944:print]   disable_allgather ............ False
[2021-11-03 09:44:32,398] [INFO] [config.py:944:print]   dump_state ................... False
[2021-11-03 09:44:32,399] [INFO] [config.py:944:print]   dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1}
[2021-11-03 09:44:32,399] [INFO] [config.py:944:print]   eigenvalue_enabled ........... False
[2021-11-03 09:44:32,399] [INFO] [config.py:944:print]   eigenvalue_gas_boundary_resolution  1
[2021-11-03 09:44:32,399] [INFO] [config.py:944:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2021-11-03 09:44:32,399] [INFO] [config.py:944:print]   eigenvalue_layer_num ......... 0
[2021-11-03 09:44:32,399] [INFO] [config.py:944:print]   eigenvalue_max_iter .......... 100
[2021-11-03 09:44:32,399] [INFO] [config.py:944:print]   eigenvalue_stability ......... 1e-06
[2021-11-03 09:44:32,399] [INFO] [config.py:944:print]   eigenvalue_tol ............... 0.01
[2021-11-03 09:44:32,399] [INFO] [config.py:944:print]   eigenvalue_verbose ........... False
[2021-11-03 09:44:32,399] [INFO] [config.py:944:print]   elasticity_enabled ........... False
[2021-11-03 09:44:32,399] [INFO] [config.py:944:print]   flops_profiler_config ........ {
    "enabled": false, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2021-11-03 09:44:32,399] [INFO] [config.py:944:print]   fp16_enabled ................. True
[2021-11-03 09:44:32,399] [INFO] [config.py:944:print]   fp16_master_weights_and_gradients  False
[2021-11-03 09:44:32,399] [INFO] [config.py:944:print]   fp16_mixed_quantize .......... False
[2021-11-03 09:44:32,399] [INFO] [config.py:944:print]   global_rank .................. 0
[2021-11-03 09:44:32,399] [INFO] [config.py:944:print]   gradient_accumulation_steps .. 16
[2021-11-03 09:44:32,399] [INFO] [config.py:944:print]   gradient_clipping ............ 1.0
[2021-11-03 09:44:32,399] [INFO] [config.py:944:print]   gradient_predivide_factor .... 1.0
[2021-11-03 09:44:32,399] [INFO] [config.py:944:print]   initial_dynamic_scale ........ 4096
[2021-11-03 09:44:32,399] [INFO] [config.py:944:print]   loss_scale ................... 0
[2021-11-03 09:44:32,399] [INFO] [config.py:944:print]   memory_breakdown ............. False
[2021-11-03 09:44:32,399] [INFO] [config.py:944:print]   optimizer_legacy_fusion ...... False
[2021-11-03 09:44:32,399] [INFO] [config.py:944:print]   optimizer_name ............... None
[2021-11-03 09:44:32,399] [INFO] [config.py:944:print]   optimizer_params ............. None
[2021-11-03 09:44:32,399] [INFO] [config.py:944:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2021-11-03 09:44:32,399] [INFO] [config.py:944:print]   pld_enabled .................. False
[2021-11-03 09:44:32,399] [INFO] [config.py:944:print]   pld_params ................... False
[2021-11-03 09:44:32,399] [INFO] [config.py:944:print]   prescale_gradients ........... False
[2021-11-03 09:44:32,399] [INFO] [config.py:944:print]   quantize_change_rate ......... 0.001
[2021-11-03 09:44:32,399] [INFO] [config.py:944:print]   quantize_groups .............. 1
[2021-11-03 09:44:32,399] [INFO] [config.py:944:print]   quantize_offset .............. 1000
[2021-11-03 09:44:32,399] [INFO] [config.py:944:print]   quantize_period .............. 1000
[2021-11-03 09:44:32,399] [INFO] [config.py:944:print]   quantize_rounding ............ 0
[2021-11-03 09:44:32,400] [INFO] [config.py:944:print]   quantize_start_bits .......... 16
[2021-11-03 09:44:32,400] [INFO] [config.py:944:print]   quantize_target_bits ......... 8
[2021-11-03 09:44:32,400] [INFO] [config.py:944:print]   quantize_training_enabled .... False
[2021-11-03 09:44:32,400] [INFO] [config.py:944:print]   quantize_type ................ 0
[2021-11-03 09:44:32,400] [INFO] [config.py:944:print]   quantize_verbose ............. False
[2021-11-03 09:44:32,400] [INFO] [config.py:944:print]   scheduler_name ............... None
[2021-11-03 09:44:32,400] [INFO] [config.py:944:print]   scheduler_params ............. None
[2021-11-03 09:44:32,400] [INFO] [config.py:944:print]   sparse_attention ............. None
[2021-11-03 09:44:32,400] [INFO] [config.py:944:print]   sparse_gradients_enabled ..... False
[2021-11-03 09:44:32,400] [INFO] [config.py:944:print]   steps_per_print .............. 2000
[2021-11-03 09:44:32,400] [INFO] [config.py:944:print]   tensorboard_enabled .......... False
[2021-11-03 09:44:32,400] [INFO] [config.py:944:print]   tensorboard_job_name ......... DeepSpeedJobName
[2021-11-03 09:44:32,400] [INFO] [config.py:944:print]   tensorboard_output_path ...... 
[2021-11-03 09:44:32,400] [INFO] [config.py:944:print]   train_batch_size ............. 512
[2021-11-03 09:44:32,400] [INFO] [config.py:944:print]   train_micro_batch_size_per_gpu  8
[2021-11-03 09:44:32,400] [INFO] [config.py:944:print]   use_quantizer_kernel ......... False
[2021-11-03 09:44:32,400] [INFO] [config.py:944:print]   wall_clock_breakdown ......... False
[2021-11-03 09:44:32,400] [INFO] [config.py:944:print]   world_size ................... 4
[2021-11-03 09:44:32,400] [INFO] [config.py:944:print]   zero_allow_untested_optimizer  False
[2021-11-03 09:44:32,400] [INFO] [config.py:944:print]   zero_config .................. {
    "stage": 1, 
    "contiguous_gradients": true, 
    "reduce_scatter": true, 
    "reduce_bucket_size": 5.000000e+08, 
    "allgather_partitions": true, 
    "allgather_bucket_size": 5.000000e+08, 
    "overlap_comm": false, 
    "load_from_fp32_weights": true, 
    "elastic_checkpoint": true, 
    "offload_param": null, 
    "offload_optimizer": null, 
    "sub_group_size": 1.000000e+09, 
    "prefetch_bucket_size": 5.000000e+07, 
    "param_persistence_threshold": 1.000000e+05, 
    "max_live_parameters": 1.000000e+09, 
    "max_reuse_distance": 1.000000e+09, 
    "gather_fp16_weights_on_model_save": false, 
    "ignore_unused_parameters": true, 
    "round_robin_gradients": false, 
    "legacy_stage1": false
}
[2021-11-03 09:44:32,400] [INFO] [config.py:944:print]   zero_enabled ................. True
[2021-11-03 09:44:32,400] [INFO] [config.py:944:print]   zero_optimization_stage ...... 1
[2021-11-03 09:44:32,400] [INFO] [config.py:946:print]   json = {
    "train_micro_batch_size_per_gpu": 8, 
    "train_batch_size": 512, 
    "gradient_clipping": 1.0, 
    "zero_optimization": {
        "stage": 1
    }, 
    "fp16": {
        "enabled": true, 
        "loss_scale": 0, 
        "loss_scale_window": 500, 
        "hysteresis": 2, 
        "min_loss_scale": 1, 
        "initial_scale_power": 12
    }, 
    "steps_per_print": 2.000000e+03, 
    "wall_clock_breakdown": false
}
[2021-11-03 09:44:32,400] [INFO] [engine.py:93:__init__] CONFIG: micro_batches=16 micro_batch_size=8
[2021-11-03 09:44:32,690] [INFO] [engine.py:151:__init__] RANK=0 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=105739264 (105.739M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-03 09:44:32,690] [INFO] [engine.py:151:__init__] RANK=3 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=105739264 (105.739M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-03 09:44:32,690] [INFO] [engine.py:151:__init__] RANK=1 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=105739264 (105.739M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-03 09:44:32,690] [INFO] [engine.py:151:__init__] RANK=2 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=105739264 (105.739M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-03 09:44:32,690] [INFO] [engine.py:151:__init__] RANK=35 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-03 09:44:32,690] [INFO] [engine.py:151:__init__] RANK=32 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-03 09:44:32,690] [INFO] [engine.py:151:__init__] RANK=33 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-03 09:44:32,690] [INFO] [engine.py:151:__init__] RANK=34 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-03 09:44:32,690] [INFO] [engine.py:151:__init__] RANK=18 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-03 09:44:32,690] [INFO] [engine.py:151:__init__] RANK=16 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-03 09:44:32,690] [INFO] [engine.py:151:__init__] RANK=19 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-03 09:44:32,690] [INFO] [engine.py:151:__init__] RANK=17 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-03 09:44:32,690] [INFO] [engine.py:151:__init__] RANK=48 STAGE=3 LAYERS=11 [21, 32) STAGE_PARAMS=105743360 (105.743M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-03 09:44:32,690] [INFO] [engine.py:151:__init__] RANK=51 STAGE=3 LAYERS=11 [21, 32) STAGE_PARAMS=105743360 (105.743M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-03 09:44:32,690] [INFO] [engine.py:151:__init__] RANK=50 STAGE=3 LAYERS=11 [21, 32) STAGE_PARAMS=105743360 (105.743M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-03 09:44:32,690] [INFO] [engine.py:151:__init__] RANK=49 STAGE=3 LAYERS=11 [21, 32) STAGE_PARAMS=105743360 (105.743M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
 > using checkpoint value 0.0002 for learning rate
 > using checkpoint value 1e-05 for minimum learning rate
 > using checkpoint value 183105 for warmup iterations
 > using checkpoint value 73242187 for total number of iterations
 > using checkpoint value cosine for decay style
successfully loaded 4 ZeRO state_dicts for rank 32
successfully loaded 4 ZeRO state_dicts for rank 47
successfully loaded 4 ZeRO state_dicts for rank 45
successfully loaded 4 ZeRO state_dicts for rank 35
successfully loaded 4 ZeRO state_dicts for rank 29
successfully loaded 4 ZeRO state_dicts for rank 33
successfully loaded 4 ZeRO state_dicts for rank 41
successfully loaded 4 ZeRO state_dicts for rank 21
successfully loaded 4 ZeRO state_dicts for rank 28
loading 4 zero partition checkpoints for rank 47
successfully loaded 4 ZeRO state_dicts for rank 36
successfully loaded 4 ZeRO state_dicts for rank 31
successfully loaded 4 ZeRO state_dicts for rank 62
successfully loaded 4 ZeRO state_dicts for rank 42
successfully loaded 4 ZeRO state_dicts for rank 18
successfully loaded 4 ZeRO state_dicts for rank 16
successfully loaded 4 ZeRO state_dicts for rank 40
loading 4 zero partition checkpoints for rank 32
successfully loaded 4 ZeRO state_dicts for rank 23
successfully loaded 4 ZeRO state_dicts for rank 24
successfully loaded 4 ZeRO state_dicts for rank 20
successfully loaded 4 ZeRO state_dicts for rank 9
successfully loaded 4 ZeRO state_dicts for rank 54
loading 4 zero partition checkpoints for rank 45
successfully loaded 4 ZeRO state_dicts for rank 34
successfully loaded 4 ZeRO state_dicts for rank 43
loading 4 zero partition checkpoints for rank 35
successfully loaded 4 ZeRO state_dicts for rank 38
successfully loaded 4 ZeRO state_dicts for rank 46
successfully loaded 4 ZeRO state_dicts for rank 39
successfully loaded 4 ZeRO state_dicts for rank 13
successfully loaded 4 ZeRO state_dicts for rank 26
loading 4 zero partition checkpoints for rank 33
successfully loaded 4 ZeRO state_dicts for rank 30
loading 4 zero partition checkpoints for rank 21
successfully loaded 4 ZeRO state_dicts for rank 37
successfully loaded 4 ZeRO state_dicts for rank 44
loading 4 zero partition checkpoints for rank 28
loading 4 zero partition checkpoints for rank 29
loading 4 zero partition checkpoints for rank 31
loading 4 zero partition checkpoints for rank 23
successfully loaded 4 ZeRO state_dicts for rank 5
loading 4 zero partition checkpoints for rank 20
loading 4 zero partition checkpoints for rank 16
loading 4 zero partition checkpoints for rank 18
successfully loaded 4 ZeRO state_dicts for rank 25
successfully loaded 4 ZeRO state_dicts for rank 12
loading 4 zero partition checkpoints for rank 34
successfully loaded 4 ZeRO state_dicts for rank 60
loading 4 zero partition checkpoints for rank 41
loading 4 zero partition checkpoints for rank 42
loading 4 zero partition checkpoints for rank 40
loading 4 zero partition checkpoints for rank 43
successfully loaded 4 ZeRO state_dicts for rank 19
loading 4 zero partition checkpoints for rank 38loading 4 zero partition checkpoints for rank 36

loading 4 zero partition checkpoints for rank 46
successfully loaded 4 ZeRO state_dicts for rank 17
loading 4 zero partition checkpoints for rank 39
loading 4 zero partition checkpoints for rank 24
successfully loaded 4 ZeRO state_dicts for rank 10
successfully loaded 4 ZeRO state_dicts for rank 48
successfully loaded 4 ZeRO state_dicts for rank 52
loading 4 zero partition checkpoints for rank 26
successfully loaded 4 ZeRO state_dicts for rank 8
successfully loaded 4 ZeRO state_dicts for rank 59
successfully loaded 4 ZeRO state_dicts for rank 53
successfully loaded 4 ZeRO state_dicts for rank 7
successfully loaded 4 ZeRO state_dicts for rank 27
loading 4 zero partition checkpoints for rank 30
successfully loaded 4 ZeRO state_dicts for rank 49
loading 4 zero partition checkpoints for rank 62
loading 4 zero partition checkpoints for rank 44
loading 4 zero partition checkpoints for rank 37
successfully loaded 4 ZeRO state_dicts for rank 51
loading 4 zero partition checkpoints for rank 9
loading 4 zero partition checkpoints for rank 54
successfully loaded 4 ZeRO state_dicts for rank 61
loading 4 zero partition checkpoints for rank 13
successfully loaded 4 ZeRO state_dicts for rank 3
successfully loaded 4 ZeRO state_dicts for rank 1
loading 4 zero partition checkpoints for rank 25
successfully loaded 4 ZeRO state_dicts for rank 22
successfully loaded 4 ZeRO state_dicts for rank 15
loading 4 zero partition checkpoints for rank 19
successfully loaded 4 ZeRO state_dicts for rank 11
successfully loaded 4 ZeRO state_dicts for rank 56
successfully loaded 4 ZeRO state_dicts for rank 55
loading 4 zero partition checkpoints for rank 5
successfully loaded 4 ZeRO state_dicts for rank 63
loading 4 zero partition checkpoints for rank 17
successfully loaded 4 ZeRO state_dicts for rank 58
successfully loaded 4 ZeRO state_dicts for rank 57
loading 4 zero partition checkpoints for rank 27
loading 4 zero partition checkpoints for rank 12
loading 4 zero partition checkpoints for rank 60
successfully loaded 4 ZeRO state_dicts for rank 50
loading 4 zero partition checkpoints for rank 8
loading 4 zero partition checkpoints for rank 10
loading 4 zero partition checkpoints for rank 52
successfully loaded 4 ZeRO state_dicts for rank 0
loading 4 zero partition checkpoints for rank 7
successfully loaded 4 ZeRO state_dicts for rank 4
successfully loaded 4 ZeRO state_dicts for rank 14
loading 4 zero partition checkpoints for rank 53
loading 4 zero partition checkpoints for rank 48
loading 4 zero partition checkpoints for rank 22
loading 4 zero partition checkpoints for rank 49
loading 4 zero partition checkpoints for rank 59
loading 4 zero partition checkpoints for rank 51
successfully loaded 4 ZeRO state_dicts for rank 2
loading 4 zero partition checkpoints for rank 61
loading 4 zero partition checkpoints for rank 3loading 4 zero partition checkpoints for rank 1

loading 4 zero partition checkpoints for rank 11
loading 4 zero partition checkpoints for rank 63
loading 4 zero partition checkpoints for rank 15
loading 4 zero partition checkpoints for rank 55
loading 4 zero partition checkpoints for rank 56
loading 4 zero partition checkpoints for rank 57
loading 4 zero partition checkpoints for rank 58
loading 4 zero partition checkpoints for rank 50
loading 4 zero partition checkpoints for rank 0
 checkpoint version 3.0
loading 4 zero partition checkpoints for rank 4
loading 4 zero partition checkpoints for rank 14
loading 4 zero partition checkpoints for rank 2
successfully loaded 4 ZeRO state_dicts for rank 6
loading 4 zero partition checkpoints for rank 6
  successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints at iteration 19679
time (ms) | load-checkpoint: 3617.28
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.691828224estimated model parameters: 1.691828224

/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.691828224
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.691828224
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.69189376estimated model parameters: 1.69189376estimated model parameters: 1.69189376


/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.69189376
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.69189376
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.209483264estimated model parameters: 1.209483264
estimated model parameters: 1.209483264

estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.69189376
estimated model parameters: 1.209483264
estimated model parameters: 1.69189376
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.69189376
estimated model parameters: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.69189376
estimated model parameters: 1.69189376
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.69189376
estimated model parameters: 1.69189376
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.691828224
estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488

estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.691828224
estimated model parameters without embeddings: 1.2095488
estimated model parameters: 1.209483264
estimated model parameters: 1.691828224
estimated model parameters: 1.691828224
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488

estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.691828224
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.691828224
estimated model parameters: 1.691828224estimated model parameters: 1.691828224

estimated model parameters: 1.209483264estimated model parameters: 1.209483264

estimated model parameters: 1.69189376
estimated model parameters: 1.691828224
estimated model parameters: 1.69189376
estimated model parameters: 1.69189376
estimated model parameters: 1.209483264
estimated model parameters: 1.69189376
estimated model parameters: 1.209483264
estimated model parameters: 1.691828224estimated model parameters: 1.691828224

estimated model parameters: 1.691828224
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264
[after model, optimizer, and learning rate scheduler are built] datetime: 2021-11-03 09:44:36 
> building train, validation, and test datasets ...
 > datasets target sizes (minimum size):
    train:      73242187
    validation: 7833600
    test:       51200
> building train, validation, and test datasets for GPT ...
 > building dataset index ...
    reading sizes...
    reading pointers...
    reading document index...
    creating numpy buffer of mmap...
    creating memory view of numpy buffer...
 > finished creating indexed dataset in 0.165370 seconds
    number of documents: 304230423
 > dataset split:
    train:
     document indices in [0, 288714672) total of 288714672 documents
    validation:
     document indices in [288714672, 303926193) total of 15211521 documents
    test:
     document indices in [303926193, 304230423) total of 304230 documents
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.222 seconds
    total number of samples: 131537224
    total number of epochs: 1
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.228 seconds
    total number of samples: 13854322
    total number of epochs: 2
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.067 seconds
    total number of samples: 137384
    total number of epochs: 1
> finished creating GPT datasets ...
[after dataloaders are built] datetime: 2021-11-03 09:44:42 
done with setup ...
training ...
time (ms) | model-and-optimizer-setup: 5387.97 | train/valid/test-data-iterators-setup: 5800.94
Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion

Number of parameters: 1.691828224 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.691828224 billion
Number of parameters: 1.691828224 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.69189376 billion
Number of parameters: 1.69189376 billion
Number of parameters: 1.69189376 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billionNumber of parameters without embeddings: 1.2095488 billion

Number of parameters: 1.209483264 billion
Number of parameters: 1.691828224 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters: 1.69189376 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
[before the start of training step] datetime: 2021-11-03 09:44:43 
[2021-11-03 09:44:43,759] [INFO] [checkpointing.py:547:forward] Activation Checkpointing Information
[2021-11-03 09:44:43,759] [INFO] [checkpointing.py:548:forward] ----Partition Activations False, CPU CHECKPOINTING False
[2021-11-03 09:44:43,759] [INFO] [checkpointing.py:551:forward] ----contiguous Memory Checkpointing False with 24 total layers
[2021-11-03 09:44:43,759] [INFO] [checkpointing.py:554:forward] ----Synchronization False
[2021-11-03 09:44:43,759] [INFO] [checkpointing.py:555:forward] ----Profiling time in checkpointing False
[Rank 48] (after 19800 iterations) memory (MB) | allocated: 2393.11083984375 | max allocated: 4380.0517578125 | reserved: 7464.0 | max reserved: 7464.0
[Rank 51] (after 19800 iterations) memory (MB) | allocated: 2393.11083984375 | max allocated: 4379.5517578125 | reserved: 6848.0 | max reserved: 6848.0
[Rank 19] (after 19800 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 3432.89208984375 | reserved: 4742.0 | max reserved: 4742.0
[Rank 35] (after 19800 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 3000.8916015625 | reserved: 4326.0 | max reserved: 4326.0
[Rank 3] (after 19800 iterations) memory (MB) | allocated: 681.95849609375 | max allocated: 4309.45458984375 | reserved: 5728.0 | max reserved: 5728.0
[Rank 16] (after 19800 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 3432.89208984375 | reserved: 4710.0 | max reserved: 4710.0
[Rank 32] (after 19800 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 3000.8916015625 | reserved: 4278.0 | max reserved: 4278.0
[Rank 0] (after 19800 iterations) memory (MB) | allocated: 681.95849609375 | max allocated: 4309.45458984375 | reserved: 5840.0 | max reserved: 5840.0
 iteration    19800/  152972 | consumed samples:      5057984 | consumed tokens:  10358751232 | elapsed time per iteration (ms): 6364.9 | learning rate: 1.979E-04 | global batch size:   512 | lm loss: 2.183335E+00 | loss scale: 524288.0 | grad norm: 37428.878 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
[Rank 17] (after 19800 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 3432.89208984375 | reserved: 4918.0 | max reserved: 4918.0
[Rank 1] (after 19800 iterations) memory (MB) | allocated: 681.95849609375 | max allocated: 4309.45458984375 | reserved: 5712.0 | max reserved: 5712.0
[Rank 33] (after 19800 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 3000.8916015625 | reserved: 4342.0 | max reserved: 4342.0
[Rank 49] (after 19800 iterations) memory (MB) | allocated: 2393.11083984375 | max allocated: 4379.5517578125 | reserved: 7464.0 | max reserved: 7464.0
[Rank 34] (after 19800 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 3000.8916015625 | reserved: 4326.0 | max reserved: 4326.0
[Rank 2] (after 19800 iterations) memory (MB) | allocated: 681.95849609375 | max allocated: 4309.45458984375 | reserved: 5728.0 | max reserved: 5728.0
[Rank 50] (after 19800 iterations) memory (MB) | allocated: 2393.11083984375 | max allocated: 4380.0517578125 | reserved: 6848.0 | max reserved: 6848.0
[Rank 18] (after 19800 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 3432.89208984375 | reserved: 4790.0 | max reserved: 4790.0
[2021-11-03 10:18:18,209] [INFO] [logging.py:68:log_dist] [Rank 0] step=20000, skipped=33, lr=[0.0001978401275310349, 0.0001978401275310349], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 20000 loss: 2.0978 iter time (s): 0.003 samples/sec: 163228.888
 iteration    20000/  152972 | consumed samples:      5160384 | consumed tokens:  10568466432 | elapsed time per iteration (ms): 6225.3 | learning rate: 1.978E-04 | global batch size:   512 | lm loss: 2.163271E+00 | loss scale: 1048576.0 | grad norm: 85054.656 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
-------------------------------------------------------------------------------------------------
 validation loss at iteration 20000 | lm loss value: 2.121953E+00 | lm loss PPL: 8.347424E+00 | 
-------------------------------------------------------------------------------------------------
 iteration    20200/  152972 | consumed samples:      5262784 | consumed tokens:  10778181632 | elapsed time per iteration (ms): 7254.7 | learning rate: 1.978E-04 | global batch size:   512 | lm loss: 2.156370E+00 | loss scale: 1048576.0 | grad norm: 78594.106 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    20400/  152972 | consumed samples:      5365184 | consumed tokens:  10987896832 | elapsed time per iteration (ms): 6174.3 | learning rate: 1.977E-04 | global batch size:   512 | lm loss: 2.153392E+00 | loss scale: 1048576.0 | grad norm: 86630.850 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    20600/  152972 | consumed samples:      5467584 | consumed tokens:  11197612032 | elapsed time per iteration (ms): 6141.6 | learning rate: 1.976E-04 | global batch size:   512 | lm loss: 2.144076E+00 | loss scale: 1048576.0 | grad norm: 90678.440 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    20800/  152972 | consumed samples:      5569984 | consumed tokens:  11407327232 | elapsed time per iteration (ms): 6134.5 | learning rate: 1.975E-04 | global batch size:   512 | lm loss: 2.166379E+00 | loss scale: 1048576.0 | grad norm: 94760.884 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    21000/  152972 | consumed samples:      5672384 | consumed tokens:  11617042432 | elapsed time per iteration (ms): 6190.7 | learning rate: 1.974E-04 | global batch size:   512 | lm loss: 2.168400E+00 | loss scale: 524288.0 | grad norm: 46761.028 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
-------------------------------------------------------------------------------------------------
 validation loss at iteration 21000 | lm loss value: 2.114544E+00 | lm loss PPL: 8.285810E+00 | 
-------------------------------------------------------------------------------------------------
saving checkpoint at iteration   21000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
[2021-11-03 12:08:02,360] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/mp_rank_01_model_states.pt
[2021-11-03 12:08:02,369] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/mp_rank_00_model_states.pt
[2021-11-03 12:08:02,749] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_0_mp_rank_04_optim_states.pt
[2021-11-03 12:08:02,752] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_1_mp_rank_06_optim_states.pt
[2021-11-03 12:08:02,757] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_0_mp_rank_10_optim_states.pt
[2021-11-03 12:08:02,760] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_3_mp_rank_06_optim_states.pt
[2021-11-03 12:08:02,760] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_0_mp_rank_05_optim_states.pt
[2021-11-03 12:08:02,760] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_2_mp_rank_07_optim_states.pt
[2021-11-03 12:08:02,761] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_1_mp_rank_07_optim_states.pt
[2021-11-03 12:08:02,762] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_1_mp_rank_11_optim_states.pt
[2021-11-03 12:08:02,763] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_1_mp_rank_10_optim_states.pt
[2021-11-03 12:08:02,763] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_0_mp_rank_08_optim_states.pt
[2021-11-03 12:08:02,764] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_3_mp_rank_10_optim_states.pt
[2021-11-03 12:08:02,765] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_2_mp_rank_04_optim_states.pt
[2021-11-03 12:08:02,765] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_2_mp_rank_09_optim_states.pt
[2021-11-03 12:08:02,765] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_2_mp_rank_08_optim_states.pt
[2021-11-03 12:08:02,766] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_3_mp_rank_11_optim_states.pt
[2021-11-03 12:08:02,774] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_3_mp_rank_04_optim_states.pt
[2021-11-03 12:08:02,775] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_0_mp_rank_07_optim_states.pt
[2021-11-03 12:08:02,777] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_2_mp_rank_11_optim_states.pt
[2021-11-03 12:08:02,778] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_2_mp_rank_05_optim_states.pt
[2021-11-03 12:08:02,779] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_2_mp_rank_10_optim_states.pt
[2021-11-03 12:08:02,781] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_3_mp_rank_05_optim_states.pt
[2021-11-03 12:08:02,783] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_0_mp_rank_06_optim_states.pt
[2021-11-03 12:08:02,784] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_1_mp_rank_09_optim_states.pt
[2021-11-03 12:08:02,787] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_3_mp_rank_08_optim_states.pt
[2021-11-03 12:08:02,788] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_1_mp_rank_04_optim_states.pt
[2021-11-03 12:08:02,788] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_3_mp_rank_07_optim_states.pt
[2021-11-03 12:08:02,788] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_2_mp_rank_06_optim_states.pt
[2021-11-03 12:08:02,789] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_0_mp_rank_11_optim_states.pt
[2021-11-03 12:08:02,790] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_1_mp_rank_05_optim_states.pt
[2021-11-03 12:08:02,794] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_3_mp_rank_09_optim_states.pt
[2021-11-03 12:08:02,795] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_0_mp_rank_09_optim_states.pt
[2021-11-03 12:08:02,801] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_1_mp_rank_08_optim_states.pt
[2021-11-03 12:08:02,887] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_0_mp_rank_13_optim_states.pt
[2021-11-03 12:08:02,894] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_0_mp_rank_15_optim_states.pt
[2021-11-03 12:08:02,895] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_0_mp_rank_02_optim_states.pt
[2021-11-03 12:08:02,923] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_0_mp_rank_01_optim_states.pt
[2021-11-03 12:08:02,926] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_0_mp_rank_14_optim_states.pt
[2021-11-03 12:08:02,927] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_1_mp_rank_02_optim_states.pt
[2021-11-03 12:08:02,929] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_1_mp_rank_00_optim_states.pt
[2021-11-03 12:08:02,930] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_3_mp_rank_03_optim_states.pt
[2021-11-03 12:08:02,931] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_2_mp_rank_13_optim_states.pt
[2021-11-03 12:08:02,931] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_2_mp_rank_12_optim_states.pt
[2021-11-03 12:08:02,934] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_0_mp_rank_12_optim_states.pt
[2021-11-03 12:08:02,938] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_3_mp_rank_00_optim_states.pt
[2021-11-03 12:08:02,950] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_2_mp_rank_03_optim_states.pt
[2021-11-03 12:08:02,953] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_3_mp_rank_02_optim_states.pt
[2021-11-03 12:08:02,954] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_2_mp_rank_15_optim_states.pt
[2021-11-03 12:08:02,956] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_0_mp_rank_03_optim_states.pt
[2021-11-03 12:08:02,956] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_0_mp_rank_00_optim_states.pt
[2021-11-03 12:08:02,965] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_1_mp_rank_01_optim_states.pt
[2021-11-03 12:08:02,968] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_1_mp_rank_03_optim_states.pt
[2021-11-03 12:08:02,972] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_3_mp_rank_01_optim_states.pt
[2021-11-03 12:08:02,972] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_2_mp_rank_02_optim_states.pt
[2021-11-03 12:08:02,974] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_2_mp_rank_01_optim_states.pt
[2021-11-03 12:08:02,975] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_1_mp_rank_13_optim_states.pt
[2021-11-03 12:08:02,975] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_1_mp_rank_12_optim_states.pt
[2021-11-03 12:08:02,977] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_3_mp_rank_14_optim_states.pt
[2021-11-03 12:08:02,978] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_2_mp_rank_14_optim_states.pt
[2021-11-03 12:08:02,982] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_2_mp_rank_00_optim_states.pt
[2021-11-03 12:08:02,982] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_1_mp_rank_15_optim_states.pt
[2021-11-03 12:08:03,047] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_3_mp_rank_15_optim_states.pt
[2021-11-03 12:08:03,048] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_3_mp_rank_13_optim_states.pt
[2021-11-03 12:08:03,058] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_3_mp_rank_12_optim_states.pt
[2021-11-03 12:08:03,131] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_1_mp_rank_14_optim_states.pt
  successfully saved checkpoint at iteration   21000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
time (ms) | save-checkpoint: 1314.54
 iteration    21200/  152972 | consumed samples:      5774784 | consumed tokens:  11826757632 | elapsed time per iteration (ms): 7257.6 | learning rate: 1.973E-04 | global batch size:   512 | lm loss: 2.157695E+00 | loss scale: 524288.0 | grad norm: 46992.332 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    21400/  152972 | consumed samples:      5877184 | consumed tokens:  12036472832 | elapsed time per iteration (ms): 6256.0 | learning rate: 1.972E-04 | global batch size:   512 | lm loss: 2.165456E+00 | loss scale: 1048576.0 | grad norm: 106117.607 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    21600/  152972 | consumed samples:      5979584 | consumed tokens:  12246188032 | elapsed time per iteration (ms): 6213.1 | learning rate: 1.971E-04 | global batch size:   512 | lm loss: 2.222055E+00 | loss scale: 131072.0 | grad norm: 147716.476 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    21800/  152972 | consumed samples:      6081984 | consumed tokens:  12455903232 | elapsed time per iteration (ms): 6193.3 | learning rate: 1.970E-04 | global batch size:   512 | lm loss: 2.295924E+00 | loss scale: 131072.0 | grad norm: 12071.298 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
[2021-11-03 13:51:33,097] [INFO] [logging.py:68:log_dist] [Rank 0] step=22000, skipped=41, lr=[0.0001968677694572278, 0.0001968677694572278], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 22000 loss: 1.9122 iter time (s): 0.003 samples/sec: 167117.932
 iteration    22000/  152972 | consumed samples:      6184384 | consumed tokens:  12665618432 | elapsed time per iteration (ms): 6158.4 | learning rate: 1.969E-04 | global batch size:   512 | lm loss: 2.179331E+00 | loss scale: 131072.0 | grad norm: 11304.876 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
-------------------------------------------------------------------------------------------------
 validation loss at iteration 22000 | lm loss value: 2.123673E+00 | lm loss PPL: 8.361794E+00 | 
-------------------------------------------------------------------------------------------------
 iteration    22200/  152972 | consumed samples:      6286784 | consumed tokens:  12875333632 | elapsed time per iteration (ms): 7103.5 | learning rate: 1.968E-04 | global batch size:   512 | lm loss: 2.141497E+00 | loss scale: 262144.0 | grad norm: 21875.317 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    22400/  152972 | consumed samples:      6389184 | consumed tokens:  13085048832 | elapsed time per iteration (ms): 6142.3 | learning rate: 1.967E-04 | global batch size:   512 | lm loss: 2.156774E+00 | loss scale: 262144.0 | grad norm: 23609.082 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
saving checkpoint at iteration   22500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
[2021-11-03 14:46:02,079] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/mp_rank_00_model_states.pt
[2021-11-03 14:46:02,080] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/mp_rank_01_model_states.pt
[2021-11-03 14:46:02,442] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_1_mp_rank_06_optim_states.pt
[2021-11-03 14:46:02,445] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_0_mp_rank_10_optim_states.pt
[2021-11-03 14:46:02,445] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_2_mp_rank_09_optim_states.pt
[2021-11-03 14:46:02,447] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_2_mp_rank_10_optim_states.pt
[2021-11-03 14:46:02,448] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_3_mp_rank_08_optim_states.pt
[2021-11-03 14:46:02,450] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_0_mp_rank_05_optim_states.pt
[2021-11-03 14:46:02,451] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_3_mp_rank_09_optim_states.pt
[2021-11-03 14:46:02,452] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_1_mp_rank_10_optim_states.pt
[2021-11-03 14:46:02,453] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_1_mp_rank_08_optim_states.pt
[2021-11-03 14:46:02,453] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_0_mp_rank_06_optim_states.pt
[2021-11-03 14:46:02,454] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_1_mp_rank_04_optim_states.pt
[2021-11-03 14:46:02,455] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_2_mp_rank_06_optim_states.pt
[2021-11-03 14:46:02,456] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_3_mp_rank_06_optim_states.pt
[2021-11-03 14:46:02,457] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_3_mp_rank_11_optim_states.pt
[2021-11-03 14:46:02,461] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_2_mp_rank_04_optim_states.pt
[2021-11-03 14:46:02,464] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_3_mp_rank_05_optim_states.pt
[2021-11-03 14:46:02,465] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_0_mp_rank_08_optim_states.pt
[2021-11-03 14:46:02,472] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_0_mp_rank_07_optim_states.pt
[2021-11-03 14:46:02,475] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_0_mp_rank_09_optim_states.pt
[2021-11-03 14:46:02,476] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_3_mp_rank_10_optim_states.pt
[2021-11-03 14:46:02,479] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_2_mp_rank_05_optim_states.pt
[2021-11-03 14:46:02,481] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_3_mp_rank_04_optim_states.pt
[2021-11-03 14:46:02,482] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_1_mp_rank_05_optim_states.pt
[2021-11-03 14:46:02,483] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_2_mp_rank_07_optim_states.pt
[2021-11-03 14:46:02,483] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_0_mp_rank_11_optim_states.pt
[2021-11-03 14:46:02,484] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_0_mp_rank_04_optim_states.pt
[2021-11-03 14:46:02,485] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_1_mp_rank_11_optim_states.pt
[2021-11-03 14:46:02,488] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_1_mp_rank_09_optim_states.pt
[2021-11-03 14:46:02,489] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_1_mp_rank_07_optim_states.pt
[2021-11-03 14:46:02,489] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_2_mp_rank_11_optim_states.pt
[2021-11-03 14:46:02,491] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_3_mp_rank_07_optim_states.pt
[2021-11-03 14:46:02,491] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_2_mp_rank_08_optim_states.pt
[2021-11-03 14:46:02,580] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_2_mp_rank_14_optim_states.pt
[2021-11-03 14:46:02,584] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_0_mp_rank_15_optim_states.pt
[2021-11-03 14:46:02,584] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_3_mp_rank_03_optim_states.pt
[2021-11-03 14:46:02,630] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_3_mp_rank_12_optim_states.pt
[2021-11-03 14:46:02,633] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_2_mp_rank_03_optim_states.pt
[2021-11-03 14:46:02,633] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_1_mp_rank_02_optim_states.pt
[2021-11-03 14:46:02,633] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_1_mp_rank_12_optim_states.pt
[2021-11-03 14:46:02,635] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_2_mp_rank_00_optim_states.pt
[2021-11-03 14:46:02,635] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_3_mp_rank_14_optim_states.pt
[2021-11-03 14:46:02,636] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_0_mp_rank_13_optim_states.pt
[2021-11-03 14:46:02,637] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_3_mp_rank_00_optim_states.pt
[2021-11-03 14:46:02,638] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_1_mp_rank_03_optim_states.pt
[2021-11-03 14:46:02,639] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_1_mp_rank_15_optim_states.pt
[2021-11-03 14:46:02,642] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_1_mp_rank_13_optim_states.pt
[2021-11-03 14:46:02,658] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_0_mp_rank_14_optim_states.pt
[2021-11-03 14:46:02,659] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_2_mp_rank_15_optim_states.pt
[2021-11-03 14:46:02,659] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_0_mp_rank_02_optim_states.pt
[2021-11-03 14:46:02,659] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_0_mp_rank_03_optim_states.pt
[2021-11-03 14:46:02,662] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_1_mp_rank_00_optim_states.pt
[2021-11-03 14:46:02,664] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_3_mp_rank_15_optim_states.pt
[2021-11-03 14:46:02,666] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_2_mp_rank_02_optim_states.pt
[2021-11-03 14:46:02,670] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_2_mp_rank_13_optim_states.pt
[2021-11-03 14:46:02,672] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_3_mp_rank_01_optim_states.pt
[2021-11-03 14:46:02,672] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_2_mp_rank_01_optim_states.pt
[2021-11-03 14:46:02,674] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_3_mp_rank_13_optim_states.pt
[2021-11-03 14:46:02,674] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_1_mp_rank_01_optim_states.pt
[2021-11-03 14:46:02,679] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_0_mp_rank_12_optim_states.pt
[2021-11-03 14:46:02,679] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_3_mp_rank_02_optim_states.pt
[2021-11-03 14:46:02,680] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_0_mp_rank_01_optim_states.pt
[2021-11-03 14:46:02,688] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_1_mp_rank_14_optim_states.pt
[2021-11-03 14:46:02,692] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_2_mp_rank_12_optim_states.pt
[2021-11-03 14:46:02,706] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_0_mp_rank_00_optim_states.pt
  successfully saved checkpoint at iteration   22500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
time (ms) | save-checkpoint: 1161.09
 iteration    22600/  152972 | consumed samples:      6491584 | consumed tokens:  13294764032 | elapsed time per iteration (ms): 6219.2 | learning rate: 1.965E-04 | global batch size:   512 | lm loss: 2.154578E+00 | loss scale: 524288.0 | grad norm: 46552.810 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    22800/  152972 | consumed samples:      6593984 | consumed tokens:  13504479232 | elapsed time per iteration (ms): 6227.8 | learning rate: 1.964E-04 | global batch size:   512 | lm loss: 2.146162E+00 | loss scale: 524288.0 | grad norm: 166710.433 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    23000/  152972 | consumed samples:      6696384 | consumed tokens:  13714194432 | elapsed time per iteration (ms): 6234.8 | learning rate: 1.963E-04 | global batch size:   512 | lm loss: 2.142305E+00 | loss scale: 524288.0 | grad norm: 46735.797 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
-------------------------------------------------------------------------------------------------
 validation loss at iteration 23000 | lm loss value: 2.145347E+00 | lm loss PPL: 8.545003E+00 | 
-------------------------------------------------------------------------------------------------
 iteration    23200/  152972 | consumed samples:      6798784 | consumed tokens:  13923909632 | elapsed time per iteration (ms): 7266.2 | learning rate: 1.962E-04 | global batch size:   512 | lm loss: 2.136395E+00 | loss scale: 1048576.0 | grad norm: 81093.988 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    23400/  152972 | consumed samples:      6901184 | consumed tokens:  14133624832 | elapsed time per iteration (ms): 6192.7 | learning rate: 1.961E-04 | global batch size:   512 | lm loss: 2.145520E+00 | loss scale: 1048576.0 | grad norm: 104944.193 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    23600/  152972 | consumed samples:      7003584 | consumed tokens:  14343340032 | elapsed time per iteration (ms): 6207.8 | learning rate: 1.960E-04 | global batch size:   512 | lm loss: 2.137600E+00 | loss scale: 524288.0 | grad norm: 47023.679 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    23800/  152972 | consumed samples:      7105984 | consumed tokens:  14553055232 | elapsed time per iteration (ms): 6191.3 | learning rate: 1.958E-04 | global batch size:   512 | lm loss: 2.120955E+00 | loss scale: 524288.0 | grad norm: 43944.154 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
[2021-11-03 17:24:45,898] [INFO] [logging.py:68:log_dist] [Rank 0] step=24000, skipped=43, lr=[0.00019571501545678581, 0.00019571501545678581], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 24000 loss: 2.4001 iter time (s): 0.003 samples/sec: 166338.839
 iteration    24000/  152972 | consumed samples:      7208384 | consumed tokens:  14762770432 | elapsed time per iteration (ms): 6178.3 | learning rate: 1.957E-04 | global batch size:   512 | lm loss: 2.145050E+00 | loss scale: 1048576.0 | grad norm: 99932.195 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
-------------------------------------------------------------------------------------------------
 validation loss at iteration 24000 | lm loss value: 2.104485E+00 | lm loss PPL: 8.202874E+00 | 
-------------------------------------------------------------------------------------------------
saving checkpoint at iteration   24000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
[2021-11-03 17:28:23,622] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/mp_rank_01_model_states.pt
[2021-11-03 17:28:23,706] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/mp_rank_00_model_states.pt
[2021-11-03 17:28:24,065] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_0_mp_rank_05_optim_states.pt
[2021-11-03 17:28:24,068] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_3_mp_rank_08_optim_states.pt
[2021-11-03 17:28:24,069] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_0_mp_rank_10_optim_states.pt
[2021-11-03 17:28:24,071] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_1_mp_rank_06_optim_states.pt
[2021-11-03 17:28:24,072] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_0_mp_rank_04_optim_states.pt
[2021-11-03 17:28:24,074] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_1_mp_rank_08_optim_states.pt
[2021-11-03 17:28:24,075] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_1_mp_rank_10_optim_states.pt
[2021-11-03 17:28:24,078] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_0_mp_rank_08_optim_states.pt
[2021-11-03 17:28:24,079] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_1_mp_rank_04_optim_states.pt
[2021-11-03 17:28:24,079] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_3_mp_rank_05_optim_states.pt
[2021-11-03 17:28:24,085] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_2_mp_rank_04_optim_states.pt
[2021-11-03 17:28:24,091] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_0_mp_rank_07_optim_states.pt
[2021-11-03 17:28:24,092] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_1_mp_rank_05_optim_states.pt
[2021-11-03 17:28:24,092] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_3_mp_rank_04_optim_states.pt
[2021-11-03 17:28:24,094] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_2_mp_rank_11_optim_states.pt
[2021-11-03 17:28:24,097] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_0_mp_rank_06_optim_states.pt
[2021-11-03 17:28:24,099] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_3_mp_rank_10_optim_states.pt
[2021-11-03 17:28:24,102] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_3_mp_rank_07_optim_states.pt
[2021-11-03 17:28:24,104] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_3_mp_rank_06_optim_states.pt
[2021-11-03 17:28:24,106] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_1_mp_rank_07_optim_states.pt
[2021-11-03 17:28:24,106] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_1_mp_rank_09_optim_states.pt
[2021-11-03 17:28:24,108] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_1_mp_rank_11_optim_states.pt
[2021-11-03 17:28:24,110] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_2_mp_rank_05_optim_states.pt
[2021-11-03 17:28:24,110] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_2_mp_rank_06_optim_states.pt
[2021-11-03 17:28:24,114] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_2_mp_rank_07_optim_states.pt
[2021-11-03 17:28:24,194] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_2_mp_rank_03_optim_states.pt
[2021-11-03 17:28:24,197] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_0_mp_rank_13_optim_states.pt
[2021-11-03 17:28:24,202] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_3_mp_rank_13_optim_states.pt
[2021-11-03 17:28:24,206] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_2_mp_rank_02_optim_states.pt
[2021-11-03 17:28:24,206] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_0_mp_rank_01_optim_states.pt
[2021-11-03 17:28:24,209] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_1_mp_rank_03_optim_states.pt
[2021-11-03 17:28:24,210] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_2_mp_rank_12_optim_states.pt
[2021-11-03 17:28:24,210] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_3_mp_rank_00_optim_states.pt
[2021-11-03 17:28:24,212] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_0_mp_rank_12_optim_states.pt
[2021-11-03 17:28:24,216] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_2_mp_rank_13_optim_states.pt
[2021-11-03 17:28:24,216] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_2_mp_rank_15_optim_states.pt
[2021-11-03 17:28:24,218] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_3_mp_rank_14_optim_states.pt
[2021-11-03 17:28:24,219] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_1_mp_rank_15_optim_states.pt
[2021-11-03 17:28:24,226] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_1_mp_rank_01_optim_states.pt
[2021-11-03 17:28:24,229] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_0_mp_rank_03_optim_states.pt
[2021-11-03 17:28:24,236] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_3_mp_rank_03_optim_states.pt
[2021-11-03 17:28:24,236] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_0_mp_rank_15_optim_states.pt
[2021-11-03 17:28:24,240] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_2_mp_rank_14_optim_states.pt
[2021-11-03 17:28:24,248] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_1_mp_rank_00_optim_states.pt
[2021-11-03 17:28:24,248] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_2_mp_rank_00_optim_states.pt
[2021-11-03 17:28:24,248] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_1_mp_rank_12_optim_states.pt
[2021-11-03 17:28:24,249] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_1_mp_rank_02_optim_states.pt
[2021-11-03 17:28:24,250] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_3_mp_rank_15_optim_states.pt
[2021-11-03 17:28:24,253] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_0_mp_rank_14_optim_states.pt
[2021-11-03 17:28:24,259] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_1_mp_rank_14_optim_states.pt
[2021-11-03 17:28:24,262] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_3_mp_rank_12_optim_states.pt
[2021-11-03 17:28:24,284] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_0_mp_rank_02_optim_states.pt
[2021-11-03 17:28:24,322] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_3_mp_rank_09_optim_states.pt
[2021-11-03 17:28:24,323] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_3_mp_rank_02_optim_states.pt
[2021-11-03 17:28:24,364] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_1_mp_rank_13_optim_states.pt
[2021-11-03 17:28:24,388] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_2_mp_rank_08_optim_states.pt
[2021-11-03 17:28:24,400] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_2_mp_rank_10_optim_states.pt
[2021-11-03 17:28:24,409] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_3_mp_rank_01_optim_states.pt
[2021-11-03 17:28:24,419] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_3_mp_rank_11_optim_states.pt
[2021-11-03 17:28:24,421] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_0_mp_rank_00_optim_states.pt
[2021-11-03 17:28:24,439] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_0_mp_rank_11_optim_states.pt
[2021-11-03 17:28:24,475] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_0_mp_rank_09_optim_states.pt
[2021-11-03 17:28:24,534] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_2_mp_rank_01_optim_states.pt
[2021-11-03 17:28:24,777] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_2_mp_rank_09_optim_states.pt
  successfully saved checkpoint at iteration   24000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
time (ms) | save-checkpoint: 1889.29
 iteration    24200/  152972 | consumed samples:      7310784 | consumed tokens:  14972485632 | elapsed time per iteration (ms): 7303.5 | learning rate: 1.956E-04 | global batch size:   512 | lm loss: 2.131296E+00 | loss scale: 1048576.0 | grad norm: 95563.217 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    24400/  152972 | consumed samples:      7413184 | consumed tokens:  15182200832 | elapsed time per iteration (ms): 6214.0 | learning rate: 1.955E-04 | global batch size:   512 | lm loss: 2.146177E+00 | loss scale: 1048576.0 | grad norm: 87485.484 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    24600/  152972 | consumed samples:      7515584 | consumed tokens:  15391916032 | elapsed time per iteration (ms): 6280.8 | learning rate: 1.953E-04 | global batch size:   512 | lm loss: 2.128909E+00 | loss scale: 1048576.0 | grad norm: 97207.937 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    24800/  152972 | consumed samples:      7617984 | consumed tokens:  15601631232 | elapsed time per iteration (ms): 6187.3 | learning rate: 1.952E-04 | global batch size:   512 | lm loss: 2.133203E+00 | loss scale: 524288.0 | grad norm: 45912.119 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    25000/  152972 | consumed samples:      7720384 | consumed tokens:  15811346432 | elapsed time per iteration (ms): 6228.0 | learning rate: 1.951E-04 | global batch size:   512 | lm loss: 2.125260E+00 | loss scale: 524288.0 | grad norm: 43973.275 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
-------------------------------------------------------------------------------------------------
 validation loss at iteration 25000 | lm loss value: 2.112226E+00 | lm loss PPL: 8.266620E+00 | 
-------------------------------------------------------------------------------------------------
 iteration    25200/  152972 | consumed samples:      7822784 | consumed tokens:  16021061632 | elapsed time per iteration (ms): 7240.5 | learning rate: 1.949E-04 | global batch size:   512 | lm loss: 2.138931E+00 | loss scale: 1048576.0 | grad norm: 84720.410 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    25400/  152972 | consumed samples:      7925184 | consumed tokens:  16230776832 | elapsed time per iteration (ms): 6222.4 | learning rate: 1.948E-04 | global batch size:   512 | lm loss: 2.116920E+00 | loss scale: 1048576.0 | grad norm: 100649.513 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
saving checkpoint at iteration   25500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
[2021-11-03 20:07:25,627] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/mp_rank_01_model_states.pt
[2021-11-03 20:07:25,703] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/mp_rank_00_model_states.pt
[2021-11-03 20:07:26,726] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_0_mp_rank_04_optim_states.pt
[2021-11-03 20:07:26,726] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_3_mp_rank_10_optim_states.pt
[2021-11-03 20:07:26,726] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_2_mp_rank_09_optim_states.pt
[2021-11-03 20:07:26,727] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_2_mp_rank_07_optim_states.pt
[2021-11-03 20:07:26,730] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_2_mp_rank_08_optim_states.pt
[2021-11-03 20:07:26,730] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_3_mp_rank_08_optim_states.pt
[2021-11-03 20:07:26,730] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_0_mp_rank_05_optim_states.pt
[2021-11-03 20:07:26,730] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_1_mp_rank_07_optim_states.pt
[2021-11-03 20:07:26,734] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_0_mp_rank_11_optim_states.pt
[2021-11-03 20:07:26,734] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_0_mp_rank_10_optim_states.pt
[2021-11-03 20:07:26,736] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_2_mp_rank_04_optim_states.pt
[2021-11-03 20:07:26,736] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_3_mp_rank_05_optim_states.pt
[2021-11-03 20:07:26,741] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_3_mp_rank_04_optim_states.pt
[2021-11-03 20:07:26,745] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_0_mp_rank_07_optim_states.pt
[2021-11-03 20:07:26,751] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_0_mp_rank_06_optim_states.pt
[2021-11-03 20:07:26,751] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_1_mp_rank_05_optim_states.pt
[2021-11-03 20:07:26,752] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_2_mp_rank_10_optim_states.pt
[2021-11-03 20:07:26,752] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_1_mp_rank_09_optim_states.pt
[2021-11-03 20:07:26,755] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_0_mp_rank_09_optim_states.pt
[2021-11-03 20:07:26,756] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_3_mp_rank_11_optim_states.pt
[2021-11-03 20:07:26,756] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_0_mp_rank_08_optim_states.pt
[2021-11-03 20:07:26,759] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_2_mp_rank_06_optim_states.pt
[2021-11-03 20:07:26,759] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_1_mp_rank_06_optim_states.pt
[2021-11-03 20:07:26,766] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_3_mp_rank_09_optim_states.pt
[2021-11-03 20:07:26,782] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_1_mp_rank_11_optim_states.pt
[2021-11-03 20:07:26,854] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_0_mp_rank_13_optim_states.pt
[2021-11-03 20:07:26,857] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_1_mp_rank_03_optim_states.pt
[2021-11-03 20:07:26,857] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_2_mp_rank_12_optim_states.pt
[2021-11-03 20:07:26,863] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_0_mp_rank_01_optim_states.pt
[2021-11-03 20:07:26,868] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_3_mp_rank_02_optim_states.pt
[2021-11-03 20:07:26,871] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_3_mp_rank_13_optim_states.pt
[2021-11-03 20:07:26,872] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_1_mp_rank_00_optim_states.pt
[2021-11-03 20:07:26,872] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_2_mp_rank_14_optim_states.pt
[2021-11-03 20:07:26,893] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_1_mp_rank_08_optim_states.pt
[2021-11-03 20:07:26,895] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_2_mp_rank_00_optim_states.pt
[2021-11-03 20:07:26,901] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_2_mp_rank_15_optim_states.pt
[2021-11-03 20:07:26,908] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_1_mp_rank_01_optim_states.pt
[2021-11-03 20:07:26,910] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_2_mp_rank_02_optim_states.pt
[2021-11-03 20:07:26,910] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_1_mp_rank_02_optim_states.pt
[2021-11-03 20:07:26,911] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_3_mp_rank_12_optim_states.pt
[2021-11-03 20:07:26,917] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_0_mp_rank_02_optim_states.pt
[2021-11-03 20:07:26,920] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_3_mp_rank_00_optim_states.pt
[2021-11-03 20:07:26,941] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_2_mp_rank_13_optim_states.pt
[2021-11-03 20:07:26,945] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_3_mp_rank_03_optim_states.pt
[2021-11-03 20:07:26,954] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_2_mp_rank_01_optim_states.pt
[2021-11-03 20:07:26,973] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_3_mp_rank_01_optim_states.pt
[2021-11-03 20:07:26,977] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_1_mp_rank_10_optim_states.pt
[2021-11-03 20:07:27,018] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_1_mp_rank_15_optim_states.pt
[2021-11-03 20:07:27,036] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_0_mp_rank_14_optim_states.pt
[2021-11-03 20:07:27,117] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_0_mp_rank_15_optim_states.pt
[2021-11-03 20:07:27,196] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_3_mp_rank_15_optim_states.pt
[2021-11-03 20:07:27,197] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_3_mp_rank_14_optim_states.pt
[2021-11-03 20:07:27,198] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_1_mp_rank_12_optim_states.pt
[2021-11-03 20:07:27,222] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_0_mp_rank_03_optim_states.pt
[2021-11-03 20:07:27,231] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_3_mp_rank_06_optim_states.pt
[2021-11-03 20:07:27,237] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_3_mp_rank_07_optim_states.pt
[2021-11-03 20:07:27,239] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_0_mp_rank_00_optim_states.pt
[2021-11-03 20:07:27,250] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_2_mp_rank_05_optim_states.pt
[2021-11-03 20:07:27,305] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_2_mp_rank_03_optim_states.pt
[2021-11-03 20:07:27,305] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_2_mp_rank_11_optim_states.pt
[2021-11-03 20:07:27,344] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_1_mp_rank_14_optim_states.pt
[2021-11-03 20:07:27,347] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_1_mp_rank_13_optim_states.pt
[2021-11-03 20:07:27,357] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_1_mp_rank_04_optim_states.pt
[2021-11-03 20:07:27,391] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_0_mp_rank_12_optim_states.pt
  successfully saved checkpoint at iteration   25500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
time (ms) | save-checkpoint: 2315.49
 iteration    25600/  152972 | consumed samples:      8027584 | consumed tokens:  16440492032 | elapsed time per iteration (ms): 6242.7 | learning rate: 1.947E-04 | global batch size:   512 | lm loss: 2.127425E+00 | loss scale: 524288.0 | grad norm: 42897.350 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    25800/  152972 | consumed samples:      8129984 | consumed tokens:  16650207232 | elapsed time per iteration (ms): 6239.0 | learning rate: 1.945E-04 | global batch size:   512 | lm loss: 2.107143E+00 | loss scale: 524288.0 | grad norm: 45165.579 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
[2021-11-03 20:59:23,949] [INFO] [logging.py:68:log_dist] [Rank 0] step=26000, skipped=48, lr=[0.00019438888040786292, 0.00019438888040786292], mom=[(0.9, 0.999), (0.9, 0.999)]
 iteration    26000/  152972 | consumed samples:      8232384 | consumed tokens:  16859922432 | elapsed time per iteration (ms): 6232.0 | learning rate: 1.944E-04 | global batch size:   512 | lm loss: 2.119192E+00 | loss scale: 1048576.0 | grad norm: 88224.772 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
steps: 26000 loss: 2.1227 iter time (s): 0.003 samples/sec: 163161.850
-------------------------------------------------------------------------------------------------
 validation loss at iteration 26000 | lm loss value: 2.104676E+00 | lm loss PPL: 8.204442E+00 | 
-------------------------------------------------------------------------------------------------
 iteration    26200/  152972 | consumed samples:      8334784 | consumed tokens:  17069637632 | elapsed time per iteration (ms): 7261.1 | learning rate: 1.942E-04 | global batch size:   512 | lm loss: 2.120884E+00 | loss scale: 1048576.0 | grad norm: 86149.086 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    26400/  152972 | consumed samples:      8437184 | consumed tokens:  17279352832 | elapsed time per iteration (ms): 6236.4 | learning rate: 1.941E-04 | global batch size:   512 | lm loss: 2.129324E+00 | loss scale: 1048576.0 | grad norm: 84281.765 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    26600/  152972 | consumed samples:      8539584 | consumed tokens:  17489068032 | elapsed time per iteration (ms): 6247.2 | learning rate: 1.940E-04 | global batch size:   512 | lm loss: 2.105178E+00 | loss scale: 1048576.0 | grad norm: 86034.149 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    26800/  152972 | consumed samples:      8641984 | consumed tokens:  17698783232 | elapsed time per iteration (ms): 6227.6 | learning rate: 1.938E-04 | global batch size:   512 | lm loss: 2.122990E+00 | loss scale: 1048576.0 | grad norm: 92453.833 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    27000/  152972 | consumed samples:      8744384 | consumed tokens:  17908498432 | elapsed time per iteration (ms): 6241.9 | learning rate: 1.937E-04 | global batch size:   512 | lm loss: 2.119131E+00 | loss scale: 1048576.0 | grad norm: 81408.847 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
-------------------------------------------------------------------------------------------------
 validation loss at iteration 27000 | lm loss value: 2.086685E+00 | lm loss PPL: 8.058160E+00 | 
-------------------------------------------------------------------------------------------------
saving checkpoint at iteration   27000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
[2021-11-03 22:50:10,103] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/mp_rank_01_model_states.pt
[2021-11-03 22:50:10,125] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/mp_rank_00_model_states.pt
[2021-11-03 22:50:11,080] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_0_mp_rank_04_optim_states.pt
[2021-11-03 22:50:11,081] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_2_mp_rank_09_optim_states.pt
[2021-11-03 22:50:11,084] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_3_mp_rank_09_optim_states.pt
[2021-11-03 22:50:11,088] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_2_mp_rank_11_optim_states.pt
[2021-11-03 22:50:11,089] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_1_mp_rank_10_optim_states.pt
[2021-11-03 22:50:11,089] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_1_mp_rank_11_optim_states.pt
[2021-11-03 22:50:11,090] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_3_mp_rank_06_optim_states.pt
[2021-11-03 22:50:11,091] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_0_mp_rank_08_optim_states.pt
[2021-11-03 22:50:11,104] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_3_mp_rank_04_optim_states.pt
[2021-11-03 22:50:11,108] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_2_mp_rank_10_optim_states.pt
[2021-11-03 22:50:11,114] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_2_mp_rank_08_optim_states.pt
[2021-11-03 22:50:11,114] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_0_mp_rank_07_optim_states.pt
[2021-11-03 22:50:11,114] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_2_mp_rank_06_optim_states.pt
[2021-11-03 22:50:11,114] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_3_mp_rank_08_optim_states.pt
[2021-11-03 22:50:11,116] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_2_mp_rank_05_optim_states.pt
[2021-11-03 22:50:11,117] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_2_mp_rank_07_optim_states.pt
[2021-11-03 22:50:11,118] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_1_mp_rank_09_optim_states.pt
[2021-11-03 22:50:11,118] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_0_mp_rank_05_optim_states.pt
[2021-11-03 22:50:11,124] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_3_mp_rank_07_optim_states.pt
[2021-11-03 22:50:11,126] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_0_mp_rank_10_optim_states.pt
[2021-11-03 22:50:11,126] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_0_mp_rank_09_optim_states.pt
[2021-11-03 22:50:11,131] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_0_mp_rank_11_optim_states.pt
[2021-11-03 22:50:11,136] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_3_mp_rank_11_optim_states.pt
[2021-11-03 22:50:11,162] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_3_mp_rank_05_optim_states.pt
[2021-11-03 22:50:11,176] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_1_mp_rank_07_optim_states.pt
[2021-11-03 22:50:11,176] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_1_mp_rank_06_optim_states.pt
[2021-11-03 22:50:11,183] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_1_mp_rank_04_optim_states.pt
[2021-11-03 22:50:11,202] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_0_mp_rank_06_optim_states.pt
[2021-11-03 22:50:11,204] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_1_mp_rank_08_optim_states.pt
[2021-11-03 22:50:11,221] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_0_mp_rank_13_optim_states.pt
[2021-11-03 22:50:11,223] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_0_mp_rank_00_optim_states.pt
[2021-11-03 22:50:11,224] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_1_mp_rank_14_optim_states.pt
[2021-11-03 22:50:11,229] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_0_mp_rank_14_optim_states.pt
[2021-11-03 22:50:11,230] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_1_mp_rank_13_optim_states.pt
[2021-11-03 22:50:11,233] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_2_mp_rank_01_optim_states.pt
[2021-11-03 22:50:11,234] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_1_mp_rank_05_optim_states.pt
[2021-11-03 22:50:11,235] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_0_mp_rank_01_optim_states.pt
[2021-11-03 22:50:11,238] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_2_mp_rank_14_optim_states.pt
[2021-11-03 22:50:11,246] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_0_mp_rank_03_optim_states.pt
[2021-11-03 22:50:11,254] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_0_mp_rank_15_optim_states.pt
[2021-11-03 22:50:11,256] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_1_mp_rank_12_optim_states.pt
[2021-11-03 22:50:11,257] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_2_mp_rank_02_optim_states.pt
[2021-11-03 22:50:11,259] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_2_mp_rank_12_optim_states.pt
[2021-11-03 22:50:11,261] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_0_mp_rank_02_optim_states.pt
[2021-11-03 22:50:11,262] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_1_mp_rank_15_optim_states.pt
[2021-11-03 22:50:11,264] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_3_mp_rank_01_optim_states.pt
[2021-11-03 22:50:11,266] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_3_mp_rank_15_optim_states.pt
[2021-11-03 22:50:11,274] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_3_mp_rank_13_optim_states.pt
[2021-11-03 22:50:11,274] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_3_mp_rank_02_optim_states.pt
[2021-11-03 22:50:11,275] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_3_mp_rank_12_optim_states.pt
[2021-11-03 22:50:11,293] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_2_mp_rank_00_optim_states.pt
[2021-11-03 22:50:11,300] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_3_mp_rank_00_optim_states.pt
[2021-11-03 22:50:11,308] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_3_mp_rank_03_optim_states.pt
[2021-11-03 22:50:11,310] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_3_mp_rank_10_optim_states.pt
[2021-11-03 22:50:11,313] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_3_mp_rank_14_optim_states.pt
[2021-11-03 22:50:11,340] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_1_mp_rank_03_optim_states.pt
[2021-11-03 22:50:11,353] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_1_mp_rank_00_optim_states.pt
[2021-11-03 22:50:11,358] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_0_mp_rank_12_optim_states.pt
[2021-11-03 22:50:11,464] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_2_mp_rank_04_optim_states.pt
[2021-11-03 22:50:11,475] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_1_mp_rank_01_optim_states.pt
[2021-11-03 22:50:11,534] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_2_mp_rank_13_optim_states.pt
[2021-11-03 22:50:11,594] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_1_mp_rank_02_optim_states.pt
[2021-11-03 22:50:11,644] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_2_mp_rank_15_optim_states.pt
[2021-11-03 22:50:12,430] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_2_mp_rank_03_optim_states.pt
  successfully saved checkpoint at iteration   27000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
time (ms) | save-checkpoint: 2774.82
 iteration    27200/  152972 | consumed samples:      8846784 | consumed tokens:  18118213632 | elapsed time per iteration (ms): 7250.6 | learning rate: 1.935E-04 | global batch size:   512 | lm loss: 2.120025E+00 | loss scale: 2097152.0 | grad norm: 171192.066 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    27400/  152972 | consumed samples:      8949184 | consumed tokens:  18327928832 | elapsed time per iteration (ms): 6274.7 | learning rate: 1.934E-04 | global batch size:   512 | lm loss: 2.116377E+00 | loss scale: 1048576.0 | grad norm: 87551.700 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    27600/  152972 | consumed samples:      9051584 | consumed tokens:  18537644032 | elapsed time per iteration (ms): 6218.6 | learning rate: 1.932E-04 | global batch size:   512 | lm loss: 2.111652E+00 | loss scale: 524288.0 | grad norm: 46189.846 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    27800/  152972 | consumed samples:      9153984 | consumed tokens:  18747359232 | elapsed time per iteration (ms): 6199.4 | learning rate: 1.931E-04 | global batch size:   512 | lm loss: 2.809586E+00 | loss scale: 32768.0 | grad norm: 3530.619 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
[2021-11-04 00:33:56,259] [INFO] [logging.py:68:log_dist] [Rank 0] step=28000, skipped=58, lr=[0.00019289429310383492, 0.00019289429310383492], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 28000 loss: 2.4090 iter time (s): 0.003 samples/sec: 164994.186
 iteration    28000/  152972 | consumed samples:      9256384 | consumed tokens:  18957074432 | elapsed time per iteration (ms): 6204.0 | learning rate: 1.929E-04 | global batch size:   512 | lm loss: 2.129491E+00 | loss scale: 32768.0 | grad norm: 3274.985 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
-------------------------------------------------------------------------------------------------
 validation loss at iteration 28000 | lm loss value: 2.104125E+00 | lm loss PPL: 8.199929E+00 | 
-------------------------------------------------------------------------------------------------
 iteration    28200/  152972 | consumed samples:      9358784 | consumed tokens:  19166789632 | elapsed time per iteration (ms): 7238.1 | learning rate: 1.927E-04 | global batch size:   512 | lm loss: 2.118730E+00 | loss scale: 65536.0 | grad norm: 5685.910 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    28400/  152972 | consumed samples:      9461184 | consumed tokens:  19376504832 | elapsed time per iteration (ms): 6216.6 | learning rate: 1.926E-04 | global batch size:   512 | lm loss: 2.121063E+00 | loss scale: 65536.0 | grad norm: 5641.948 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
saving checkpoint at iteration   28500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
[2021-11-04 01:29:10,847] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/mp_rank_00_model_states.pt
[2021-11-04 01:29:10,854] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/mp_rank_01_model_states.pt
[2021-11-04 01:29:11,490] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_1_mp_rank_07_optim_states.pt
[2021-11-04 01:29:11,492] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_2_mp_rank_07_optim_states.pt
[2021-11-04 01:29:11,493] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_0_mp_rank_07_optim_states.pt
[2021-11-04 01:29:11,494] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_2_mp_rank_11_optim_states.pt
[2021-11-04 01:29:11,494] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_2_mp_rank_09_optim_states.pt
[2021-11-04 01:29:11,496] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_0_mp_rank_10_optim_states.pt
[2021-11-04 01:29:11,496] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_0_mp_rank_06_optim_states.pt
[2021-11-04 01:29:11,496] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_3_mp_rank_07_optim_states.pt
[2021-11-04 01:29:11,497] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_2_mp_rank_06_optim_states.pt
[2021-11-04 01:29:11,498] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_3_mp_rank_09_optim_states.pt
[2021-11-04 01:29:11,498] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_1_mp_rank_11_optim_states.pt
[2021-11-04 01:29:11,502] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_1_mp_rank_10_optim_states.pt
[2021-11-04 01:29:11,503] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_0_mp_rank_09_optim_states.pt
[2021-11-04 01:29:11,504] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_3_mp_rank_04_optim_states.pt
[2021-11-04 01:29:11,519] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_2_mp_rank_05_optim_states.pt
[2021-11-04 01:29:11,523] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_3_mp_rank_08_optim_states.pt
[2021-11-04 01:29:11,523] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_2_mp_rank_10_optim_states.pt
[2021-11-04 01:29:11,523] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_3_mp_rank_06_optim_states.pt
[2021-11-04 01:29:11,524] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_1_mp_rank_05_optim_states.pt
[2021-11-04 01:29:11,527] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_0_mp_rank_04_optim_states.pt
[2021-11-04 01:29:11,527] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_1_mp_rank_09_optim_states.pt
[2021-11-04 01:29:11,530] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_0_mp_rank_05_optim_states.pt
[2021-11-04 01:29:11,531] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_1_mp_rank_08_optim_states.pt
[2021-11-04 01:29:11,531] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_2_mp_rank_04_optim_states.pt
[2021-11-04 01:29:11,531] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_0_mp_rank_08_optim_states.pt
[2021-11-04 01:29:11,532] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_0_mp_rank_11_optim_states.pt
[2021-11-04 01:29:11,534] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_3_mp_rank_11_optim_states.pt
[2021-11-04 01:29:11,545] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_2_mp_rank_08_optim_states.pt
[2021-11-04 01:29:11,546] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_3_mp_rank_10_optim_states.pt
[2021-11-04 01:29:11,617] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_0_mp_rank_13_optim_states.pt
[2021-11-04 01:29:11,618] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_2_mp_rank_00_optim_states.pt
[2021-11-04 01:29:11,626] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_3_mp_rank_12_optim_states.pt
[2021-11-04 01:29:11,627] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_1_mp_rank_12_optim_states.pt
[2021-11-04 01:29:11,631] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_0_mp_rank_15_optim_states.pt
[2021-11-04 01:29:11,633] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_2_mp_rank_14_optim_states.pt
[2021-11-04 01:29:11,634] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_3_mp_rank_14_optim_states.pt
[2021-11-04 01:29:11,635] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_2_mp_rank_15_optim_states.pt
[2021-11-04 01:29:11,636] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_2_mp_rank_01_optim_states.pt
[2021-11-04 01:29:11,639] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_3_mp_rank_03_optim_states.pt
[2021-11-04 01:29:11,640] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_3_mp_rank_15_optim_states.pt
[2021-11-04 01:29:11,651] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_3_mp_rank_05_optim_states.pt
[2021-11-04 01:29:11,651] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_1_mp_rank_06_optim_states.pt
[2021-11-04 01:29:11,656] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_0_mp_rank_03_optim_states.pt
[2021-11-04 01:29:11,656] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_0_mp_rank_14_optim_states.pt
[2021-11-04 01:29:11,660] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_0_mp_rank_02_optim_states.pt
[2021-11-04 01:29:11,661] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_2_mp_rank_13_optim_states.pt
[2021-11-04 01:29:11,664] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_1_mp_rank_03_optim_states.pt
[2021-11-04 01:29:11,668] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_3_mp_rank_00_optim_states.pt
[2021-11-04 01:29:11,671] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_3_mp_rank_13_optim_states.pt
[2021-11-04 01:29:11,673] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_0_mp_rank_12_optim_states.pt
[2021-11-04 01:29:11,674] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_1_mp_rank_15_optim_states.pt
[2021-11-04 01:29:11,674] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_2_mp_rank_12_optim_states.pt
[2021-11-04 01:29:11,677] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_0_mp_rank_01_optim_states.pt
[2021-11-04 01:29:11,688] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_3_mp_rank_02_optim_states.pt
[2021-11-04 01:29:11,713] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_1_mp_rank_04_optim_states.pt
[2021-11-04 01:29:11,716] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_0_mp_rank_00_optim_states.pt
[2021-11-04 01:29:11,719] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_1_mp_rank_14_optim_states.pt
[2021-11-04 01:29:11,749] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_1_mp_rank_01_optim_states.pt
[2021-11-04 01:29:11,761] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_2_mp_rank_03_optim_states.pt
[2021-11-04 01:29:11,765] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_3_mp_rank_01_optim_states.pt
[2021-11-04 01:29:11,821] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_1_mp_rank_02_optim_states.pt
[2021-11-04 01:29:11,825] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_1_mp_rank_00_optim_states.pt
[2021-11-04 01:29:11,837] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_2_mp_rank_02_optim_states.pt
[2021-11-04 01:29:12,189] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_1_mp_rank_13_optim_states.pt
  successfully saved checkpoint at iteration   28500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
time (ms) | save-checkpoint: 1783.78
 iteration    28600/  152972 | consumed samples:      9563584 | consumed tokens:  19586220032 | elapsed time per iteration (ms): 6237.5 | learning rate: 1.924E-04 | global batch size:   512 | lm loss: 2.118943E+00 | loss scale: 65536.0 | grad norm: 5540.820 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    28800/  152972 | consumed samples:      9665984 | consumed tokens:  19795935232 | elapsed time per iteration (ms): 6222.3 | learning rate: 1.922E-04 | global batch size:   512 | lm loss: 2.107822E+00 | loss scale: 131072.0 | grad norm: 10244.201 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    29000/  152972 | consumed samples:      9768384 | consumed tokens:  20005650432 | elapsed time per iteration (ms): 6242.1 | learning rate: 1.921E-04 | global batch size:   512 | lm loss: 2.096643E+00 | loss scale: 131072.0 | grad norm: 10914.822 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
-------------------------------------------------------------------------------------------------
 validation loss at iteration 29000 | lm loss value: 2.074451E+00 | lm loss PPL: 7.960179E+00 | 
-------------------------------------------------------------------------------------------------
 iteration    29200/  152972 | consumed samples:      9870784 | consumed tokens:  20215365632 | elapsed time per iteration (ms): 7206.9 | learning rate: 1.919E-04 | global batch size:   512 | lm loss: 2.111728E+00 | loss scale: 262144.0 | grad norm: 23340.670 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    29400/  152972 | consumed samples:      9973184 | consumed tokens:  20425080832 | elapsed time per iteration (ms): 6187.2 | learning rate: 1.917E-04 | global batch size:   512 | lm loss: 2.093884E+00 | loss scale: 262144.0 | grad norm: 21416.883 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    29600/  152972 | consumed samples:     10075584 | consumed tokens:  20634796032 | elapsed time per iteration (ms): 6203.2 | learning rate: 1.916E-04 | global batch size:   512 | lm loss: 2.107256E+00 | loss scale: 262144.0 | grad norm: 21814.564 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    29800/  152972 | consumed samples:     10177984 | consumed tokens:  20844511232 | elapsed time per iteration (ms): 6205.0 | learning rate: 1.914E-04 | global batch size:   512 | lm loss: 2.096914E+00 | loss scale: 524288.0 | grad norm: 42469.932 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
[2021-11-04 04:07:49,795] [INFO] [logging.py:68:log_dist] [Rank 0] step=30000, skipped=58, lr=[0.0001912222371885727, 0.0001912222371885727], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 30000 loss: 2.1393 iter time (s): 0.003 samples/sec: 165514.637
 iteration    30000/  152972 | consumed samples:     10280384 | consumed tokens:  21054226432 | elapsed time per iteration (ms): 6208.9 | learning rate: 1.912E-04 | global batch size:   512 | lm loss: 2.087625E+00 | loss scale: 524288.0 | grad norm: 43867.710 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
-------------------------------------------------------------------------------------------------
 validation loss at iteration 30000 | lm loss value: 2.083765E+00 | lm loss PPL: 8.034659E+00 | 
-------------------------------------------------------------------------------------------------
saving checkpoint at iteration   30000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
[2021-11-04 04:11:12,066] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/mp_rank_00_model_states.pt
[2021-11-04 04:11:12,085] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/mp_rank_01_model_states.pt
[2021-11-04 04:11:12,479] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_3_mp_rank_08_optim_states.pt
[2021-11-04 04:11:12,483] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_0_mp_rank_05_optim_states.pt
[2021-11-04 04:11:12,483] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_0_mp_rank_07_optim_states.pt
[2021-11-04 04:11:12,484] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_1_mp_rank_08_optim_states.pt
[2021-11-04 04:11:12,486] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_3_mp_rank_09_optim_states.pt
[2021-11-04 04:11:12,486] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_0_mp_rank_09_optim_states.pt
[2021-11-04 04:11:12,487] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_0_mp_rank_10_optim_states.pt
[2021-11-04 04:11:12,488] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_2_mp_rank_07_optim_states.pt
[2021-11-04 04:11:12,491] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_1_mp_rank_11_optim_states.pt
[2021-11-04 04:11:12,493] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_3_mp_rank_07_optim_states.pt
[2021-11-04 04:11:12,495] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_1_mp_rank_04_optim_states.pt
[2021-11-04 04:11:12,496] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_2_mp_rank_06_optim_states.pt
[2021-11-04 04:11:12,500] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_3_mp_rank_10_optim_states.pt
[2021-11-04 04:11:12,508] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_1_mp_rank_10_optim_states.pt
[2021-11-04 04:11:12,509] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_3_mp_rank_04_optim_states.pt
[2021-11-04 04:11:12,510] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_1_mp_rank_05_optim_states.pt
[2021-11-04 04:11:12,511] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_0_mp_rank_04_optim_states.pt
[2021-11-04 04:11:12,511] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_1_mp_rank_07_optim_states.pt
[2021-11-04 04:11:12,515] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_2_mp_rank_05_optim_states.pt
[2021-11-04 04:11:12,516] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_3_mp_rank_11_optim_states.pt
[2021-11-04 04:11:12,518] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_1_mp_rank_09_optim_states.pt
[2021-11-04 04:11:12,521] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_0_mp_rank_11_optim_states.pt
[2021-11-04 04:11:12,532] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_3_mp_rank_05_optim_states.pt
[2021-11-04 04:11:12,548] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_2_mp_rank_11_optim_states.pt
[2021-11-04 04:11:12,561] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_2_mp_rank_04_optim_states.pt
[2021-11-04 04:11:12,576] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_2_mp_rank_08_optim_states.pt
[2021-11-04 04:11:12,578] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_2_mp_rank_10_optim_states.pt
[2021-11-04 04:11:12,599] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_2_mp_rank_09_optim_states.pt
[2021-11-04 04:11:12,600] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_1_mp_rank_06_optim_states.pt
[2021-11-04 04:11:12,610] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_0_mp_rank_13_optim_states.pt
[2021-11-04 04:11:12,616] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_1_mp_rank_02_optim_states.pt
[2021-11-04 04:11:12,617] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_2_mp_rank_13_optim_states.pt
[2021-11-04 04:11:12,621] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_0_mp_rank_01_optim_states.pt
[2021-11-04 04:11:12,622] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_1_mp_rank_12_optim_states.pt
[2021-11-04 04:11:12,625] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_3_mp_rank_02_optim_states.pt
[2021-11-04 04:11:12,625] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_2_mp_rank_15_optim_states.pt
[2021-11-04 04:11:12,625] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_1_mp_rank_03_optim_states.pt
[2021-11-04 04:11:12,626] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_0_mp_rank_15_optim_states.pt
[2021-11-04 04:11:12,626] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_0_mp_rank_08_optim_states.pt
[2021-11-04 04:11:12,627] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_3_mp_rank_00_optim_states.pt
[2021-11-04 04:11:12,631] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_0_mp_rank_00_optim_states.pt
[2021-11-04 04:11:12,648] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_1_mp_rank_00_optim_states.pt
[2021-11-04 04:11:12,652] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_0_mp_rank_14_optim_states.pt
[2021-11-04 04:11:12,654] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_3_mp_rank_03_optim_states.pt
[2021-11-04 04:11:12,656] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_3_mp_rank_06_optim_states.pt
[2021-11-04 04:11:12,659] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_2_mp_rank_14_optim_states.pt
[2021-11-04 04:11:12,661] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_1_mp_rank_15_optim_states.pt
[2021-11-04 04:11:12,662] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_0_mp_rank_03_optim_states.pt
[2021-11-04 04:11:12,664] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_1_mp_rank_14_optim_states.pt
[2021-11-04 04:11:12,670] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_2_mp_rank_12_optim_states.pt
[2021-11-04 04:11:12,671] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_0_mp_rank_12_optim_states.pt
[2021-11-04 04:11:12,678] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_3_mp_rank_01_optim_states.pt
[2021-11-04 04:11:12,679] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_2_mp_rank_00_optim_states.pt
[2021-11-04 04:11:12,682] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_0_mp_rank_02_optim_states.pt
[2021-11-04 04:11:12,691] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_0_mp_rank_06_optim_states.pt
[2021-11-04 04:11:12,691] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_2_mp_rank_03_optim_states.pt
[2021-11-04 04:11:12,733] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_2_mp_rank_02_optim_states.pt
[2021-11-04 04:11:12,740] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_2_mp_rank_01_optim_states.pt
[2021-11-04 04:11:12,741] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_3_mp_rank_13_optim_states.pt
[2021-11-04 04:11:12,765] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_1_mp_rank_01_optim_states.pt
[2021-11-04 04:11:12,772] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_3_mp_rank_15_optim_states.pt
[2021-11-04 04:11:12,775] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_3_mp_rank_14_optim_states.pt
[2021-11-04 04:11:12,776] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_1_mp_rank_13_optim_states.pt
[2021-11-04 04:11:12,801] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_3_mp_rank_12_optim_states.pt
  successfully saved checkpoint at iteration   30000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
time (ms) | save-checkpoint: 1206.69
 iteration    30200/  152972 | consumed samples:     10382784 | consumed tokens:  21263941632 | elapsed time per iteration (ms): 7244.6 | learning rate: 1.910E-04 | global batch size:   512 | lm loss: 2.081899E+00 | loss scale: 1048576.0 | grad norm: 85511.149 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    30400/  152972 | consumed samples:     10485184 | consumed tokens:  21473656832 | elapsed time per iteration (ms): 6192.5 | learning rate: 1.909E-04 | global batch size:   512 | lm loss: 2.101988E+00 | loss scale: 1048576.0 | grad norm: 83616.188 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    30600/  152972 | consumed samples:     10587584 | consumed tokens:  21683372032 | elapsed time per iteration (ms): 6217.4 | learning rate: 1.907E-04 | global batch size:   512 | lm loss: 2.091301E+00 | loss scale: 524288.0 | grad norm: 45865.924 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    30800/  152972 | consumed samples:     10689984 | consumed tokens:  21893087232 | elapsed time per iteration (ms): 6219.9 | learning rate: 1.905E-04 | global batch size:   512 | lm loss: 2.086251E+00 | loss scale: 524288.0 | grad norm: 47683.541 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
saving checkpoint at iteration   30807 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
[2021-11-04 05:34:48,885] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/mp_rank_01_model_states.pt
[2021-11-04 05:34:48,886] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/mp_rank_00_model_states.pt
[2021-11-04 05:34:49,273] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_0_mp_rank_04_optim_states.pt
[2021-11-04 05:34:49,273] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_2_mp_rank_09_optim_states.pt
[2021-11-04 05:34:49,274] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_0_mp_rank_10_optim_states.pt
[2021-11-04 05:34:49,275] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_0_mp_rank_05_optim_states.pt
[2021-11-04 05:34:49,276] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_2_mp_rank_04_optim_states.pt
[2021-11-04 05:34:49,277] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_1_mp_rank_09_optim_states.pt
[2021-11-04 05:34:49,279] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_3_mp_rank_08_optim_states.pt
[2021-11-04 05:34:49,279] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_2_mp_rank_06_optim_states.pt
[2021-11-04 05:34:49,280] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_0_mp_rank_09_optim_states.pt
[2021-11-04 05:34:49,281] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_2_mp_rank_08_optim_states.pt
[2021-11-04 05:34:49,281] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_3_mp_rank_09_optim_states.pt
[2021-11-04 05:34:49,284] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_3_mp_rank_06_optim_states.pt
[2021-11-04 05:34:49,285] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_1_mp_rank_10_optim_states.pt
[2021-11-04 05:34:49,290] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_3_mp_rank_04_optim_states.pt
[2021-11-04 05:34:49,294] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_0_mp_rank_07_optim_states.pt
[2021-11-04 05:34:49,301] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_2_mp_rank_10_optim_states.pt
[2021-11-04 05:34:49,302] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_0_mp_rank_08_optim_states.pt
[2021-11-04 05:34:49,302] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_0_mp_rank_06_optim_states.pt
[2021-11-04 05:34:49,307] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_3_mp_rank_05_optim_states.pt
[2021-11-04 05:34:49,308] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_1_mp_rank_08_optim_states.pt
[2021-11-04 05:34:49,308] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_1_mp_rank_06_optim_states.pt
[2021-11-04 05:34:49,308] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_2_mp_rank_07_optim_states.pt
[2021-11-04 05:34:49,309] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_3_mp_rank_11_optim_states.pt
[2021-11-04 05:34:49,310] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_0_mp_rank_11_optim_states.pt
[2021-11-04 05:34:49,311] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_2_mp_rank_05_optim_states.pt
[2021-11-04 05:34:49,312] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_3_mp_rank_07_optim_states.pt
[2021-11-04 05:34:49,318] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_3_mp_rank_10_optim_states.pt
[2021-11-04 05:34:49,328] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_1_mp_rank_04_optim_states.pt
[2021-11-04 05:34:49,328] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_1_mp_rank_05_optim_states.pt
[2021-11-04 05:34:49,336] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_1_mp_rank_07_optim_states.pt
[2021-11-04 05:34:49,400] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_0_mp_rank_13_optim_states.pt
[2021-11-04 05:34:49,407] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_3_mp_rank_13_optim_states.pt
[2021-11-04 05:34:49,413] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_2_mp_rank_15_optim_states.pt
[2021-11-04 05:34:49,417] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_1_mp_rank_15_optim_states.pt
[2021-11-04 05:34:49,418] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_0_mp_rank_15_optim_states.pt
[2021-11-04 05:34:49,420] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_0_mp_rank_03_optim_states.pt
[2021-11-04 05:34:49,421] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_3_mp_rank_14_optim_states.pt
[2021-11-04 05:34:49,423] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_1_mp_rank_13_optim_states.pt
[2021-11-04 05:34:49,423] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_2_mp_rank_14_optim_states.pt
[2021-11-04 05:34:49,427] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_3_mp_rank_00_optim_states.pt
[2021-11-04 05:34:49,433] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_3_mp_rank_03_optim_states.pt
[2021-11-04 05:34:49,448] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_0_mp_rank_14_optim_states.pt
[2021-11-04 05:34:49,448] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_1_mp_rank_12_optim_states.pt
[2021-11-04 05:34:49,450] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_3_mp_rank_02_optim_states.pt
[2021-11-04 05:34:49,453] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_1_mp_rank_14_optim_states.pt
[2021-11-04 05:34:49,454] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_0_mp_rank_02_optim_states.pt
[2021-11-04 05:34:49,456] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_0_mp_rank_01_optim_states.pt
[2021-11-04 05:34:49,458] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_2_mp_rank_12_optim_states.pt
[2021-11-04 05:34:49,460] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_2_mp_rank_13_optim_states.pt
[2021-11-04 05:34:49,461] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_3_mp_rank_01_optim_states.pt
[2021-11-04 05:34:49,462] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_3_mp_rank_15_optim_states.pt
[2021-11-04 05:34:49,467] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_2_mp_rank_11_optim_states.pt
[2021-11-04 05:34:49,469] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_0_mp_rank_12_optim_states.pt
[2021-11-04 05:34:49,470] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_1_mp_rank_01_optim_states.pt
[2021-11-04 05:34:49,478] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_3_mp_rank_12_optim_states.pt
[2021-11-04 05:34:49,508] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_1_mp_rank_11_optim_states.pt
[2021-11-04 05:34:49,512] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_0_mp_rank_00_optim_states.pt
[2021-11-04 05:34:49,524] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_1_mp_rank_00_optim_states.pt
[2021-11-04 05:34:49,537] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_1_mp_rank_02_optim_states.pt
[2021-11-04 05:34:49,539] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_1_mp_rank_03_optim_states.pt
[2021-11-04 05:34:49,551] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_2_mp_rank_01_optim_states.pt
[2021-11-04 05:34:49,551] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_2_mp_rank_00_optim_states.pt
[2021-11-04 05:34:49,591] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_2_mp_rank_02_optim_states.pt
[2021-11-04 05:34:49,593] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_2_mp_rank_03_optim_states.pt
  successfully saved checkpoint at iteration   30807 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
time (ms) | save-checkpoint: 1320.06
[exiting program after 1190.0076513091724 minutes] datetime: 2021-11-04 05:34:49 
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
----------------------------------------------------------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
JIT compiled ops requires ninja


DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report


JIT compiled ops requires ninja----------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------


--------------------------------------------------
op nameop nameop name op name   ................................................ ................  installedinstalled installed   installed......    ..compatiblecompatiblecompatible
 
--------------------------------------------------compatible

--------------------------------------------------
--------------------------------------------------
--------------------------------------------------

cpu_adam ...............cpu_adam cpu_adamcpu_adam [92m[YES][0m  ............... ...............  .....................[92m[YES][0m[92m[YES][0m  [92m[YES][0m  [92m[OKAY][0m...... ......
 ...... [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

fused_adam ............. [92m[YES][0mfused_adam fused_adamfused_adam ...... .............  ............. [92m[OKAY][0m.............
 [92m[YES][0m [92m[YES][0m [92m[YES][0m ......  ......fused_lamb......  [92m[OKAY][0m [92m[OKAY][0m.............
[92m[OKAY][0m 

[92m[YES][0m fused_lamb......  fused_lamb[92m[OKAY][0m............. fused_lamb
  [92m[YES][0m..........................   ......[92m[YES][0m[92m[YES][0m   [92m[OKAY][0m............
  [92m[OKAY][0m[92m[OKAY][0m

sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............transformer sparse_attn [93m[NO][0m............  sparse_attn  ................... [92m[YES][0m............[93m[NO][0m  [93m[NO][0m  ...... ....... [92m[OKAY][0m.......[92m[OKAY][0m
  
[92m[OKAY][0m[92m[OKAY][0mtransformer

 stochastic_transformer............transformertransformer   [92m[YES][0m . ........................ ......  [92m[YES][0m [92m[YES][0m[92m[YES][0m[92m[OKAY][0m 
  ..................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0mstochastic_transformer

 
. [92m[YES][0mstochastic_transformer ...... stochastic_transformer . [92m[OKAY][0m .
[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.

[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. 
............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m transformer_inference.......  ..[93m[NO][0m 
[93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... transformer_inference[92m[OKAY][0m 
.. [93m[NO][0m ....... [92m[OKAY][0mquantizer
 .............. [93m[NO][0m .......utils  [92m[OKAY][0m..................
 [92m[YES][0m ......-------------------------------------------------- [92m[OKAY][0m

quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
------------------------------------------------------------------------------------------------------------------------------------------------------

JIT compiled ops requires ninja
DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------
op name
op nameop name   op name................................................    installed................installed installed installed   .. ....compatible..   
compatiblecompatiblecompatible
--------------------------------------------------


----------------------------------------------------------------------------------------------------
--------------------------------------------------

cpu_adamcpu_adam  ...............cpu_adamcpu_adam...............    [92m[YES][0m............... ...............[92m[YES][0m......    ......[92m[YES][0m [92m[YES][0m[92m[OKAY][0m[92m[OKAY][0m 
 
............  [92m[OKAY][0m[92m[OKAY][0m

fused_adam ............. fused_adam[92m[YES][0m  .............fused_adam...... fused_adam[92m[YES][0m  [92m[OKAY][0m  .............
...... ............. [92m[YES][0m [92m[OKAY][0m [92m[YES][0m
fused_lamb......   ...................[92m[OKAY][0m  
[92m[OKAY][0mfused_lamb[92m[YES][0m
  ...................fused_lamb  fused_lamb[92m[YES][0m [92m[OKAY][0m  .............
...................   [92m[OKAY][0m[92m[YES][0m[92m[YES][0m
  ............  [92m[OKAY][0m[92m[OKAY][0m

sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0msparse_attn
 ............transformer sparse_attnsparse_attn [93m[NO][0m ............   ................... ............[92m[YES][0m[93m[NO][0m  ......  [92m[OKAY][0m [93m[NO][0m[92m[OKAY][0m ..............

  [92m[OKAY][0m[92m[OKAY][0m

transformerstochastic_transformer  transformertransformer. ............  ............ [92m[YES][0m [92m[YES][0m............[92m[YES][0m    [92m[YES][0m...... ...... ............[92m[OKAY][0m  
 [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


stochastic_transformerstochastic_transformerstochastic_transformer   ...   [92m[YES][0m[92m[YES][0m[92m[YES][0m   ..................   [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m

*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.

[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.

[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.

async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja
--------------------------------------------------

JIT compiled ops requires ninja
JIT compiled ops requires ninja
JIT compiled ops requires ninja

ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
op name
op nameop name   op name................................................    installed installed................installed  ....  installed.. compatible  
compatiblecompatible..-------------------------------------------------- 


compatible----------------------------------------------------------------------------------------------------


--------------------------------------------------
cpu_adam cpu_adam............... cpu_adam ............... [92m[YES][0mcpu_adam ...............  ...... [92m[YES][0m............... [92m[YES][0m   [92m[OKAY][0m[92m[YES][0m............
   ......[92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0m
fused_adam ............. fused_adam[92m[YES][0m  fused_adam.............fused_adam ......  ............. [92m[YES][0m[92m[OKAY][0m .............
 [92m[YES][0m ......[92m[YES][0m   fused_lamb......[92m[OKAY][0m......  [92m[OKAY][0m
 .............
[92m[OKAY][0m 
[92m[YES][0mfused_lamb  fused_lamb................... fused_lamb   [92m[OKAY][0m..........................
 [92m[YES][0m [92m[YES][0m  ......[92m[YES][0m ...... [92m[OKAY][0m ......
[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attnsparse_attn transformer sparse_attn............  ............ ............[93m[NO][0m............    [92m[YES][0m[93m[NO][0m....... [93m[NO][0m   .............[92m[OKAY][0m ....... 
[92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m
transformer
 transformer............transformerstochastic_transformer   ............[92m[YES][0m ............  . [92m[YES][0m......  [92m[YES][0m [92m[YES][0m......[92m[OKAY][0m   
............[92m[OKAY][0m 
 [92m[OKAY][0mstochastic_transformer[92m[OKAY][0m
 stochastic_transformer
.  [92m[YES][0m. stochastic_transformer ...... [92m[YES][0m . [92m[OKAY][0m ......
 [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizerasync_io  ............................. [93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[93m[NO][0m
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.

----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
DeepSpeed C++/CUDA extension op report

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------JIT compiled ops requires ninja--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninja   .................................... ninja..................[92m[OKAY][0m   
..................[92m[OKAY][0m[92m[OKAY][0m --------------------------------------------------

[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------op name
--------------------------------------------------
 
................op nameop nameop name   ................ installed................   ..................installedinstalled   .. ..compatible installed
compatible  
--------------------------------------------------..compatible
 --------------------------------------------------

compatible--------------------------------------------------

--------------------------------------------------
cpu_adamcpu_adam  ..............................  cpu_adam[92m[YES][0mcpu_adam [92m[YES][0m  .....................   [92m[OKAY][0m.....................[92m[YES][0m
   [92m[YES][0m[92m[OKAY][0m...... 
 ......[92m[OKAY][0m 
[92m[OKAY][0mfused_adam
 ............. [92m[YES][0mfused_adam  ...................  fused_adam[92m[YES][0m[92m[OKAY][0m fused_adam
......   .............[92m[OKAY][0m .............fused_lamb
[92m[YES][0m   [92m[YES][0mfused_lamb...................    ...................[92m[YES][0m[92m[OKAY][0m   
......[92m[OKAY][0m[92m[YES][0m 
[92m[OKAY][0m fused_lamb...... 
 .............[92m[OKAY][0mfused_lamb
  .............[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attn ............ sparse_attn[93m[NO][0m  ...................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0mtransformer
sparse_attn sparse_attn ............ ............transformer ............ [92m[YES][0m [93m[NO][0m   .........................[93m[NO][0m    [92m[YES][0m[92m[OKAY][0m[92m[OKAY][0m .......
......
  [92m[OKAY][0m[92m[OKAY][0mstochastic_transformer

 transformer. transformer stochastic_transformer  [92m[YES][0m.........................    ......[92m[YES][0m  [92m[YES][0m[92m[OKAY][0m[92m[YES][0m......
   ......[92m[OKAY][0m...... 
 [92m[OKAY][0m[92m[OKAY][0m

stochastic_transformerstochastic_transformer  . .[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path ...............torch install path  ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version ....................torch version  1.8.2....................
 1.8.2torch cuda version
 ............... torch cuda version11.1 
............... nvcc version11.1 
..................... nvcc version11.2 
..................... deepspeed install path11.2 
...........deepspeed install path  ...........['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] 
['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']deepspeed info
 ...................deepspeed info  0.5.5+58a8e13, 58a8e13, master...................
 deepspeed wheel compiled w.0.5.5+58a8e13, 58a8e13, master 
......deepspeed wheel compiled w.  torch 1.8, cuda 11.1......
 torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.

[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------


DeepSpeed C++/CUDA extension op report--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------


JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja
JIT compiled ops requires ninja

ninjaninjaninjaninja   .................................... ..................  .................. [92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m


[92m[OKAY][0m----------------------------------------------------------------------------------------------------


--------------------------------------------------op nameop name 
-------------------------------------------------- ................op name
 ................installed op name................   installed ..installed ................   ..installedcompatible ..
 compatible --------------------------------------------------
..compatible
-------------------------------------------------- 

compatible--------------------------------------------------

--------------------------------------------------
cpu_adam cpu_adam...............  ...............[92m[YES][0mcpu_adam   [92m[YES][0m.....................cpu_adam   [92m[OKAY][0m ......
...............[92m[YES][0m   [92m[OKAY][0m......[92m[YES][0m
  [92m[OKAY][0m......
 fused_adam[92m[OKAY][0m 
............. fused_adam[92m[YES][0m  fused_adam...................   [92m[YES][0m[92m[OKAY][0m.............fused_adam 
  ......[92m[YES][0m.............   fused_lamb[92m[OKAY][0m......[92m[YES][0m  
 .............[92m[OKAY][0m...... 
 [92m[YES][0mfused_lamb[92m[OKAY][0m  ......
.............fused_lamb   [92m[OKAY][0m[92m[YES][0m............. fused_lamb
......   .............[92m[YES][0m[92m[OKAY][0m 
 [92m[YES][0m......  ......[92m[OKAY][0m
 [92m[OKAY][0msparse_attn
 ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............transformer  [93m[NO][0m............sparse_attn   .......[92m[YES][0msparse_attn ............ ......  [92m[OKAY][0m ............[92m[OKAY][0m
 [93m[NO][0m
[93m[NO][0mtransformer   .......stochastic_transformer....... ............   [92m[OKAY][0m.[92m[OKAY][0m[92m[YES][0m 

 [92m[YES][0m...... transformertransformer ...... [92m[OKAY][0m  ............
[92m[OKAY][0m............ 
 [92m[YES][0m[92m[YES][0m  ......stochastic_transformer......   [92m[OKAY][0m.[92m[OKAY][0m 

[92m[YES][0m ...... [92m[OKAY][0m
stochastic_transformerstochastic_transformer  ..  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0mtransformer_inference
 .. [93m[NO][0m ....... utils[92m[OKAY][0m 
.................. [92m[YES][0m ......utils  [92m[OKAY][0m..................
 [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0mquantizer  .....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.

[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.

[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... async_io[93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m
 ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0mtransformer_inference
 .. [93m[NO][0m .......utils  [92m[OKAY][0m..................
 [92m[YES][0m ...... utils[92m[OKAY][0m 
.................. [92m[YES][0m ...... quantizer[92m[OKAY][0m 
.............. [93m[NO][0m ....... [92m[OKAY][0mquantizer
 .............. [93m[NO][0m .......-------------------------------------------------- 
[92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io async_io...............  [93m[NO][0m...............  .......[93m[NO][0m  [93m[NO][0m.......
 [93m[NO][0m
transformer_inference .. transformer_inference[93m[NO][0m  .........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
utils .................. [92m[YES][0m utils......  ..................[92m[OKAY][0m 
[92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m quantizer.......  ..............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m--------------------------------------------------

--------------------------------------------------
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.21.8.2

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']

deepspeed infodeepspeed info  ......................................  0.5.5+58a8e13, 58a8e13, master0.5.5+58a8e13, 58a8e13, master

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... DeepSpeed general environment info:['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']

deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w.torch install path  .....................  torch 1.8, cuda 11.1
['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja
--------------------------------------------------
JIT compiled ops requires ninja

--------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------
DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    ...................................................... ..................  [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
----------------------------------------------------------------------------------------------------

op name
op name op name op name ................................ ................   installed................installedinstalled    ....installed..  compatible  compatible

..compatible---------------------------------------------------------------------------------------------------- 


compatible--------------------------------------------------

--------------------------------------------------
cpu_adamcpu_adam  ...............cpu_adam...............   [92m[YES][0m[92m[YES][0mcpu_adam   ..........................................    [92m[OKAY][0m[92m[OKAY][0m[92m[YES][0m[92m[YES][0m

  ............  [92m[OKAY][0m[92m[OKAY][0m

fused_adam ............. fused_adam[92m[YES][0m  fused_adam......fused_adam.............    .............[92m[OKAY][0m[92m[YES][0m.............   [92m[YES][0m
......[92m[YES][0m   [92m[OKAY][0m......fused_lamb
......   .............[92m[OKAY][0m[92m[OKAY][0m 

fused_lamb[92m[YES][0m  ...................  fused_lambfused_lamb[92m[OKAY][0m [92m[YES][0m 
............. ............. ...... [92m[YES][0m [92m[YES][0m [92m[OKAY][0m ......
 ......[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attntransformer  ........................sparse_attn sparse_attn [93m[NO][0m [92m[YES][0m   .....................................    [92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m
[93m[NO][0m
  .......transformer....... stochastic_transformer [92m[OKAY][0m............  
 [92m[OKAY][0m.
[92m[YES][0mtransformer  transformer [92m[YES][0m .................. ............  [92m[YES][0m...... [92m[OKAY][0m  [92m[YES][0m[92m[OKAY][0m
......
  ......[92m[OKAY][0m stochastic_transformer
[92m[OKAY][0m 
. stochastic_transformer[92m[YES][0m  .......stochastic_transformer   [92m[YES][0m[92m[OKAY][0m. 
 ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

------------------------------------------------------------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


op nameop nameop nameop name    ................................................................   installed installedinstalled installed  ..  ......compatible  compatible 
compatible
compatible--------------------------------------------------
--------------------------------------------------


----------------------------------------------------------------------------------------------------

cpu_adamcpu_adam  cpu_adamcpu_adam............... ...............  [92m[YES][0m............... ............... [92m[YES][0m   ......[92m[YES][0m[92m[YES][0m......    ......[92m[OKAY][0m......[92m[OKAY][0m 
 
[92m[OKAY][0m[92m[OKAY][0m

fused_adamfused_adam  fused_adam.............fused_adam.............    .............[92m[YES][0m............. [92m[YES][0m  [92m[YES][0m ......[92m[YES][0m ......   ............ [92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m

[92m[OKAY][0m

fused_lambfused_lambfused_lamb fused_lamb  ............. ..........................  ............. [92m[YES][0m[92m[YES][0m [92m[YES][0m  [92m[YES][0m ......  ..................  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


sparse_attnsparse_attnsparse_attnsparse_attn    .................................... ............   [93m[NO][0m[93m[NO][0m [93m[NO][0m[93m[NO][0m.......    .....................[92m[OKAY][0m   
[92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m

transformertransformer transformertransformer ............  ............ ........................[92m[YES][0m  [92m[YES][0m  [92m[YES][0m [92m[YES][0m...... ......  ......[92m[OKAY][0m ...... 
 [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


stochastic_transformer . [92m[YES][0mstochastic_transformerstochastic_transformerstochastic_transformer    ........   .[92m[OKAY][0m[92m[YES][0m ......[92m[YES][0m 
  [92m[YES][0m[92m[OKAY][0m...... 
 ......[92m[OKAY][0m 
[92m[OKAY][0m
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version DeepSpeed general environment info:..................... 11.2

deepspeed install path ........... torch install path['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
 deepspeed info...............  ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']...... 
torch 1.8, cuda 11.1
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
[93m [WARNING] [0m async_io: please install the libaio-devel package with yumquantizer 
.............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report


DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninja----------------------------------------------------------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja


DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report

------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
----------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja


----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninja
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja   .................. .................................... .................. [92m[OKAY][0m 
 [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m--------------------------------------------------


----------------------------------------------------------------------------------------------------op name
--------------------------------------------------
 op nameop name
................  ................  op name................installed installed  ................ .. installed.. installed  compatible ..
 compatible..--------------------------------------------------compatible

 
--------------------------------------------------compatible--------------------------------------------------


--------------------------------------------------
cpu_adam ............... [92m[YES][0m cpu_adam......  cpu_adam...............[92m[OKAY][0mcpu_adam
   ...............[92m[YES][0m...............   ......[92m[YES][0m[92m[YES][0m  [92m[OKAY][0m ......fused_adam......
   .............[92m[OKAY][0m [92m[OKAY][0m
[92m[YES][0m
 ...... fused_adam[92m[OKAY][0m 
............. [92m[YES][0m fused_adam......fused_lamb fused_adam  [92m[OKAY][0m .......................................
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report
--------------------------------------------------


------------------------------------------------------------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

   [92m[YES][0m[92m[YES][0m[92m[YES][0m  fused_lamb............   ...... [92m[OKAY][0m[92m[OKAY][0m............. 

[92m[OKAY][0m 
[92m[YES][0mfused_lamb ...... fused_lamb ............. [92m[OKAY][0m .............
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja


----------------------------------------------------------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninja
JIT compiled ops requires ninja
JIT compiled ops requires ninja

[92m[YES][0m sparse_attn [92m[YES][0m ...... ............ ......  [93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m 

.......sparse_attn  [92m[OKAY][0m............
 [93m[NO][0m .......transformer  [92m[OKAY][0m............
 [92m[YES][0m transformer......  sparse_attnsparse_attn............ [92m[OKAY][0m  ............
[92m[YES][0m............   [93m[NO][0m......stochastic_transformer[93m[NO][0m   [92m[OKAY][0m....... . 
 .......[92m[YES][0m [92m[OKAY][0mstochastic_transformer [92m[OKAY][0m
 ......
. transformer [92m[OKAY][0m [92m[YES][0m
transformer............   [92m[YES][0m..................   [92m[OKAY][0m......[92m[YES][0m
  [92m[OKAY][0m......
 [92m[OKAY][0m
stochastic_transformer .stochastic_transformer  [92m[YES][0m .......  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
ninjaninjaninjaninja   .................. .................................... .................. [92m[OKAY][0m 
 [92m[OKAY][0m[92m[OKAY][0m--------------------------------------------------[92m[OKAY][0m


op name
 ----------------------------------------------------------------------------------------------------................--------------------------------------------------

 
op nameop nameinstalled  op name .................. ................................    compatibleinstalledinstalled
 installed --------------------------------------------------....  
 ..compatiblecompatible 

compatible--------------------------------------------------
--------------------------------------------------
cpu_adam-------------------------------------------------- 

............... [92m[YES][0m ...... [92m[OKAY][0m
cpu_adam cpu_adamcpu_adam...............   ...............[92m[YES][0m...............   ......fused_adam[92m[YES][0m  [92m[YES][0m[92m[OKAY][0m............. 
 ...... ...... [92m[YES][0m [92m[OKAY][0m [92m[OKAY][0m......

 fused_adam[92m[OKAY][0m 
............. [92m[YES][0m ...... [92m[OKAY][0mfused_adam
fused_lambfused_adam   ....................................... fused_lamb   [92m[YES][0m[92m[YES][0m.............[92m[YES][0m  [92m[YES][0m   ........................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


fused_lambfused_lamb  ..........................  [92m[YES][0m[92m[YES][0msparse_attn   ........................ sparse_attn   [92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m............
 
 .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
transformer ............ transformer [92m[YES][0m............  ...... sparse_attn[92m[YES][0msparse_attn [92m[OKAY][0m  ............
......  ............[92m[OKAY][0m[93m[NO][0m 
 [93m[NO][0mstochastic_transformer.......   stochastic_transformer[92m[OKAY][0m........ 
  .[92m[YES][0m [92m[OKAY][0mtransformer[92m[YES][0m 
  ............ transformer............ [92m[OKAY][0m[92m[OKAY][0m  

............[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
stochastic_transformer stochastic_transformer.  .[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
ninjaninjaninjaninja   .................................... ..................  .................. [92m[OKAY][0m[92m[OKAY][0m 
[92m[OKAY][0m
[92m[OKAY][0m
--------------------------------------------------

----------------------------------------------------------------------------------------------------op name
--------------------------------------------------
 
................op nameop name op name   installed................................................   .. installedinstalledinstalled    ..compatible.... 
  compatible
compatible--------------------------------------------------compatible--------------------------------------------------


--------------------------------------------------
--------------------------------------------------
cpu_adamcpu_adamcpu_adam   ...............cpu_adam............... ...............  [92m[YES][0m............... [92m[YES][0m   ......[92m[YES][0m[92m[YES][0m......    [92m[OKAY][0m......[92m[OKAY][0m......
 
 [92m[OKAY][0m[92m[OKAY][0m

fused_adam .............fused_adam fused_adam[92m[YES][0m fused_adam  .......................... ...... .............   [92m[YES][0m[92m[OKAY][0m[92m[YES][0m 
[92m[YES][0m ...... ...... ......fused_lamb  [92m[OKAY][0m[92m[OKAY][0m............. 

 [92m[OKAY][0m[92m[YES][0m
 ...... fused_lamb[92m[OKAY][0mfused_lamb 
fused_lamb ............. ............. ............. [92m[YES][0m [92m[YES][0m[92m[YES][0m   ..................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [92m[YES][0msparse_attnsparse_attn   sparse_attn..............................    [92m[OKAY][0m............[93m[NO][0m
[93m[NO][0m   .......[93m[NO][0m.......   stochastic_transformer[92m[OKAY][0m.......[92m[OKAY][0m
 
 [92m[OKAY][0m.transformertransformer
   ............[92m[YES][0m............   transformer[92m[YES][0m......[92m[YES][0m    ........................[92m[OKAY][0m  [92m[YES][0m[92m[OKAY][0m 
 ......
[92m[OKAY][0m 
[92m[OKAY][0m
stochastic_transformer stochastic_transformer.  stochastic_transformer.[92m[YES][0m   .[92m[YES][0m......  [92m[YES][0m ...... [92m[OKAY][0m ......[92m[OKAY][0m
 
[92m[OKAY][0m
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.

[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inferenceutils  ....................  [93m[NO][0m[92m[YES][0m  .............  [92m[OKAY][0m[92m[OKAY][0m

utilsquantizer  ................................  [92m[YES][0m[93m[NO][0m  .............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------quantizer
 .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum[93m [WARNING] [0m async_io: please install the libaio-devel package with yum

[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0masync_io  ......................  [93m[NO][0m[93m[NO][0m
 ....... [93m[NO][0m
transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utils .................. utils[92m[YES][0m  ........................  [92m[YES][0m[92m[OKAY][0m ......
 [92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja--------------------------------------------------
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report--------------------------------------------------


--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
ninjaninjaninjaninja   .................. .................................... .................. [92m[OKAY][0m  [92m[OKAY][0m
[92m[OKAY][0m
[92m[OKAY][0m
------------------------------------------------------------------------------------------------------------------------------------------------------


--------------------------------------------------op nameop nameop name
   op name................................................   installedinstalled   installed....................    ..compatibleinstalled compatible 

compatible..----------------------------------------------------------------------------------------------------
 

--------------------------------------------------compatible

--------------------------------------------------
cpu_adamcpu_adam  ..............................cpu_adam cpu_adam[92m[YES][0m ............... [92m[YES][0m ...... [92m[OKAY][0m
   [92m[YES][0m.....................  ...... [92m[OKAY][0m[92m[YES][0m
fused_adam   [92m[OKAY][0m...................
  [92m[OKAY][0m[92m[YES][0m
fused_adam  ...................  [92m[YES][0m[92m[OKAY][0m fused_adam
......  .............fused_adam[92m[OKAY][0m  fused_lamb
.............[92m[YES][0m  ...................  [92m[YES][0mfused_lamb[92m[OKAY][0m   [92m[YES][0m
...................  [92m[YES][0m [92m[OKAY][0m ......fused_lamb ...... 
[92m[OKAY][0m .............
[92m[OKAY][0m 
[92m[YES][0m ...... [92m[OKAY][0mfused_lamb
 sparse_attn .........................  [93m[NO][0msparse_attn[92m[YES][0m   ...................  sparse_attn[93m[NO][0m[92m[OKAY][0m......   [92m[OKAY][0m
...................  [92m[OKAY][0m
transformer[93m[NO][0m
  ................... transformer[92m[YES][0m  [92m[OKAY][0m............  [92m[YES][0m
...... ...... transformer [92m[OKAY][0m [92m[OKAY][0m
sparse_attn
............  stochastic_transformer[92m[YES][0mstochastic_transformer   ....................    [92m[YES][0m[92m[YES][0m[93m[NO][0m[92m[OKAY][0m  
 ...... stochastic_transformer....... ......[92m[OKAY][0m .  [92m[OKAY][0m
[92m[OKAY][0m
[92m[YES][0m
 ......transformer  ............[92m[OKAY][0m
 [92m[YES][0m ...... [92m[OKAY][0m
stochastic_transformer . [92m[YES][0m ...... [92m[OKAY][0m
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.

[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils ..................[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. 
[92m[YES][0m ...... [92m[OKAY][0m
async_ioquantizer  ............................. [93m[NO][0m ....... [93m[NO][0m 
[93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.21.8.2

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']

deepspeed infodeepspeed info  ......................................  0.5.5+58a8e13, 58a8e13, master0.5.5+58a8e13, 58a8e13, master

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
------------------------------------------------------------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
DeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------

----------------------------------------------------------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report--------------------------------------------------

--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja   ....................................  .................. ..................[92m[OKAY][0m[92m[OKAY][0m  

[92m[OKAY][0m[92m[OKAY][0m--------------------------------------------------
--------------------------------------------------

--------------------------------------------------
op name--------------------------------------------------
 
op name................op name   op name................installed................    ................installed..installed    installed..compatible..   ..
 compatiblecompatiblecompatible--------------------------------------------------


----------------------------------------------------------------------------------------------------
--------------------------------------------------

cpu_adam cpu_adamcpu_adam...............   ..............................[92m[YES][0mcpu_adam    [92m[YES][0m[92m[YES][0m.....................   ...... ......[92m[YES][0m [92m[OKAY][0m  [92m[OKAY][0m
[92m[OKAY][0m......

 [92m[OKAY][0m
fused_adamfused_adamfused_adam   ..........................fused_adam.............   [92m[YES][0m ............. [92m[YES][0m[92m[YES][0m ......  [92m[YES][0m...... ......  [92m[OKAY][0m ......
[92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m
fused_lamb
 .............fused_lambfused_lamb  fused_lamb.............[92m[YES][0m   .............  .............[92m[YES][0m[92m[YES][0m......    [92m[YES][0m......[92m[OKAY][0m ...... 
[92m[OKAY][0m...... 
 [92m[OKAY][0m[92m[OKAY][0m

sparse_attnsparse_attn  ........................sparse_attnsparse_attn    [93m[NO][0m[93m[NO][0m........................    .......[93m[NO][0m.......[93m[NO][0m    .......[92m[OKAY][0m.......
[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0mtransformer

 transformertransformer............ transformer  ............ ............[92m[YES][0m............    [92m[YES][0m[92m[YES][0m[92m[YES][0m ......  ...... ......[92m[OKAY][0m......   
[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


stochastic_transformer .stochastic_transformerstochastic_transformer stochastic_transformer  [92m[YES][0m  .........    [92m[YES][0m[92m[YES][0m[92m[YES][0m[92m[OKAY][0m   
..................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

DeepSpeed C++/CUDA extension op report--------------------------------------------------JIT compiled ops requires ninja


----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
DeepSpeed C++/CUDA extension op report

--------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninja
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja  ....................................    ..................[92m[OKAY][0m..................[92m[OKAY][0m 
 
[92m[OKAY][0m--------------------------------------------------[92m[OKAY][0m--------------------------------------------------


op name-------------------------------------------------- op name................
--------------------------------------------------  
installed................op name  op name.. installed  ................ ................compatible..  
 compatibleinstalled--------------------------------------------------installed
  --------------------------------------------------
....
  compatiblecompatible

----------------------------------------------------------------------------------------------------

cpu_adam cpu_adam...............  ...............[92m[YES][0m  [92m[YES][0m......cpu_adam cpu_adam  ...... [92m[OKAY][0m............... 
............... [92m[OKAY][0m 
[92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0mfused_adam

 .............fused_adam  [92m[YES][0m.............  ......[92m[YES][0m  [92m[OKAY][0m
fused_adam......fused_adam   fused_lamb[92m[OKAY][0m.............
.............   .............[92m[YES][0m[92m[YES][0m fused_lamb [92m[YES][0m  ...................  ...... ......[92m[YES][0m [92m[OKAY][0m  [92m[OKAY][0m......[92m[OKAY][0m

 
[92m[OKAY][0m
fused_lambfused_lamb  ..........................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

sparse_attn sparse_attn............  ............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
transformersparse_attn transformersparse_attn ............  ............ ........................[92m[YES][0m    [93m[NO][0m[92m[YES][0m[93m[NO][0m......    [92m[OKAY][0m.......
.............   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0mstochastic_transformer


 transformer.  transformerstochastic_transformer[92m[YES][0m............    ...................[92m[YES][0m    [92m[OKAY][0m[92m[YES][0m[92m[YES][0m
......   ......[92m[OKAY][0m......
 [92m[OKAY][0m 
[92m[OKAY][0m
stochastic_transformer . [92m[YES][0mstochastic_transformer  ...... .[92m[OKAY][0m 
[92m[YES][0m ...... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninjaJIT compiled ops requires ninja
JIT compiled ops requires ninja


--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninja   ......................................................  [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

----------------------------------------------------------------------------------------------------
--------------------------------------------------

op nameop name op name................  ninja ................................installed    installed..................installed..    [92m[OKAY][0m..compatible..

  ----------------------------------------------------------------------------------------------------compatiblecompatible


op name---------------------------------------------------------------------------------------------------- 

................ installed .. cpu_adamcompatible 
............... --------------------------------------------------[92m[YES][0mcpu_adam
cpu_adam   ....................................   [92m[OKAY][0m[92m[YES][0m[92m[YES][0m
  cpu_adam............   ...............[92m[OKAY][0m[92m[OKAY][0m 

[92m[YES][0m fused_adam...... .............  [92m[OKAY][0m[92m[YES][0m
 ...... fused_adamfused_adam[92m[OKAY][0m 
 ..........................  [92m[YES][0m[92m[YES][0m  fused_lamb............ fused_adam  ............. [92m[OKAY][0m[92m[OKAY][0m .............

[92m[YES][0m  [92m[YES][0m......  fused_lambfused_lamb[92m[OKAY][0m ...... 
 ..........................[92m[OKAY][0m  
[92m[YES][0m[92m[YES][0m  ............  fused_lamb[92m[OKAY][0m[92m[OKAY][0m 

............. [92m[YES][0m ...... [92m[OKAY][0msparse_attn
 ............ [93m[NO][0m ....... [92m[OKAY][0msparse_attn
sparse_attn  ........................ transformer [93m[NO][0m[93m[NO][0m  sparse_attn ............ ..........................   [92m[OKAY][0m [92m[YES][0m[92m[OKAY][0m
[93m[NO][0m 
 transformer......transformer.......   ............ [92m[OKAY][0m[92m[OKAY][0m
............
  [92m[YES][0m[92m[YES][0m  transformer......stochastic_transformer ......  ............ [92m[OKAY][0m. 
 [92m[OKAY][0m[92m[YES][0m[92m[YES][0m
  stochastic_transformer............   .stochastic_transformer[92m[OKAY][0m[92m[OKAY][0m 
 
[92m[YES][0m.  ......[92m[YES][0m stochastic_transformer [92m[OKAY][0m ......
.  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum[93m [WARNING] [0m async_io: please install the libaio-devel package with yum

[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ...............async_io [93m[NO][0m  ............... .......[93m[NO][0m  [93m[NO][0m
....... [93m[NO][0m
transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path torch install path...............  ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']torch version
 .................... torch version1.8.2 
.................... 1.8.2torch cuda version
 ............... torch cuda version11.1 
...............nvcc version  11.1.....................
 nvcc version11.2 
.....................deepspeed install path  11.2...........
 deepspeed install path ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']...........
 deepspeed info ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']...................
 deepspeed info0.5.5+58a8e13, 58a8e13, master 
................... deepspeed wheel compiled w.0.5.5+58a8e13, 58a8e13, master 
......deepspeed wheel compiled w.  torch 1.8, cuda 11.1......
 torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
> setting tensorboard ...
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------

JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

DeepSpeed C++/CUDA extension op report
--------------------------------------------------

DeepSpeed C++/CUDA extension op report--------------------------------------------------JIT compiled ops requires ninja


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


----------------------------------------------------------------------------------------------------
--------------------------------------------------

--------------------------------------------------op nameop name
  op nameop name................................    ................................installedinstalled    installed..installed..    compatible..
..compatible--------------------------------------------------  

compatiblecompatible--------------------------------------------------


----------------------------------------------------------------------------------------------------

cpu_adam ...............cpu_adam cpu_adam [92m[YES][0m  ...............cpu_adam......  ............... [92m[OKAY][0m[92m[YES][0m 
[92m[YES][0m...............  ...... ......[92m[YES][0m   [92m[OKAY][0m[92m[OKAY][0m......

 fused_adam[92m[OKAY][0m 
............. [92m[YES][0m ......fused_adam fused_adam[92m[OKAY][0m  
fused_adam..........................   ............. [92m[YES][0m[92m[YES][0mfused_lamb[92m[YES][0m    ...............................    [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m[92m[YES][0m

 ......fused_lamb  [92m[OKAY][0m.............
fused_lambfused_lamb   [92m[YES][0m..........................   ......[92m[YES][0m[92m[YES][0m   [92m[OKAY][0m............
 sparse_attn [92m[OKAY][0m [92m[OKAY][0m
............
 [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn transformer............  ............[93m[NO][0m sparse_attn sparse_attn.......[92m[YES][0m   ............[92m[OKAY][0m ............
......   [93m[NO][0m[93m[NO][0mtransformer[92m[OKAY][0m 
  ..........................   stochastic_transformer[92m[YES][0m[92m[OKAY][0m[92m[OKAY][0m  ......

.  [92m[OKAY][0mtransformer[92m[YES][0mtransformer
   .............................. stochastic_transformer   [92m[YES][0m[92m[YES][0m[92m[OKAY][0m.  
...... ...... [92m[YES][0m [92m[OKAY][0m [92m[OKAY][0m
......
 [92m[OKAY][0m
stochastic_transformerstochastic_transformer  ..  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.

[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ...................DeepSpeed general environment info: 0.5.5+58a8e13, 58a8e13, master

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path 
............... torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version ....................['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] 
1.8.2
torch versiontorch cuda version  ...................................  1.8.211.1

nvcc versiontorch cuda version  ....................................  11.211.1

deepspeed install pathnvcc version  ................................  11.2['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']

deepspeed install pathdeepspeed info  ..............................  0.5.5+58a8e13, 58a8e13, master['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']

deepspeed wheel compiled w.deepspeed info  .........................  torch 1.8, cuda 11.10.5.5+58a8e13, 58a8e13, master

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
using world size: 64, data-parallel-size: 4, tensor-model-parallel size: 4, pipeline-model-parallel size: 4 
using torch.float16 for parameters ...
------------------------ arguments ------------------------
  accumulate_allreduce_grads_in_fp32 .............. False
  adam_beta1 ...................................... 0.9
  adam_beta2 ...................................... 0.999
  adam_eps ........................................ 1e-08
  adlr_autoresume ................................. False
  adlr_autoresume_interval ........................ 1000
  apply_query_key_layer_scaling ................... True
  apply_residual_connection_post_layernorm ........ False
  attention_dropout ............................... 0.1
  attention_softmax_in_fp32 ....................... False
  bert_binary_head ................................ True
  bert_load ....................................... None
  bf16 ............................................ False
  bias_dropout_fusion ............................. True
  bias_gelu_fusion ................................ True
  biencoder_projection_dim ........................ 0
  biencoder_shared_query_context_model ............ False
  block_data_path ................................. None
  checkpoint_activations .......................... True
  checkpoint_in_cpu ............................... False
  checkpoint_num_layers ........................... 1
  clip_grad ....................................... 1.0
  codecarbon_dir .................................. None
  consumed_train_samples .......................... 0
  consumed_train_tokens ........................... 0
  consumed_valid_samples .......................... 0
  contigious_checkpointing ........................ False
  cpu_optimizer ................................... False
  cpu_torch_adam .................................. False
  curriculum_learning ............................. False
  data_impl ....................................... mmap
  data_parallel_size .............................. 4
  data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document']
  dataloader_type ................................. single
  DDP_impl ........................................ local
  decoder_seq_length .............................. None
  deepscale ....................................... False
  deepscale_config ................................ None
  deepspeed ....................................... True
  deepspeed_activation_checkpointing .............. True
  deepspeed_config ................................ ./ds_config.1825190.json
  deepspeed_mpi ................................... False
  distribute_checkpointed_activations ............. False
  distributed_backend ............................. nccl
  embedding_path .................................. None
  encoder_seq_length .............................. 2048
  eod_mask_loss ................................... False
  eval_interval ................................... 1000
  eval_iters ...................................... 100
  evidence_data_path .............................. None
  exit_duration_in_mins ........................... 1190
  exit_interval ................................... None
  ffn_hidden_size ................................. 8192
  finetune ........................................ False
  fp16 ............................................ True
  fp16_lm_cross_entropy ........................... False
  fp32_residual_connection ........................ False
  gigaflos_no_embeds .............................. 0
  global_batch_size ............................... 512
  glu_activation .................................. None
  hidden_dropout .................................. 0.1
  hidden_size ..................................... 2048
  hysteresis ...................................... 2
  ict_head_size ................................... None
  ict_load ........................................ None
  img_dim ......................................... 224
  indexer_batch_size .............................. 128
  indexer_log_interval ............................ 1000
  init_method_std ................................. 0.02
  init_method_xavier_uniform ...................... False
  initial_loss_scale .............................. 4294967296
  kv_channels ..................................... 128
  layernorm_epsilon ............................... 1e-05
  lazy_mpu_init ................................... None
  load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
  local_rank ...................................... 0
  log_batch_size_to_tensorboard ................... True
  log_interval .................................... 200
  log_learning_rate_to_tensorboard ................ True
  log_level ....................................... None
  log_level_replica ............................... None
  log_loss_scale_to_tensorboard ................... True
  log_num_zeros_in_grad ........................... False
  log_params_norm ................................. False
  log_timers_to_tensorboard ....................... True
  log_validation_ppl_to_tensorboard ............... True
  loss_on_targets_only ............................ True
  loss_scale ...................................... 12.0
  loss_scale_window ............................... 1000
  lr .............................................. 0.0002
  lr_decay_iters .................................. None
  lr_decay_samples ................................ 73242187
  lr_decay_style .................................. cosine
  lr_decay_tokens ................................. None
  lr_warmup_fraction .............................. None
  lr_warmup_iters ................................. 0
  lr_warmup_samples ............................... 183105
  make_vocab_size_divisible_by .................... 128
  mask_prob ....................................... 0.15
  masked_softmax_fusion ........................... True
  max_position_embeddings ......................... 2048
  memory_centric_tiled_linear ..................... False
  merge_file ...................................... /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/data/gpt2-merges.txt
  micro_batch_size ................................ 8
  min_loss_scale .................................. 1.0
  min_lr .......................................... 1e-05
  mmap_warmup ..................................... False
  no_load_optim ................................... None
  no_load_rng ..................................... None
  no_save_optim ................................... None
  no_save_rng ..................................... None
  num_attention_heads ............................. 16
  num_channels .................................... 3
  num_classes ..................................... 1000
  num_layers ...................................... 24
  num_layers_per_virtual_pipeline_stage ........... None
  num_workers ..................................... 2
  onnx_safe ....................................... None
  openai_gelu ..................................... False
  optimizer ....................................... adam
  override_lr_scheduler ........................... False
  params_dtype .................................... torch.float16
  partition_activations ........................... False
  patch_dim ....................................... 16
  pipeline_model_parallel_size .................... 4
  position_embedding_type ......................... PositionEmbeddingType.absolute
  profile_backward ................................ False
  query_in_block_prob ............................. 0.1
  rampup_batch_size ............................... ['32', '32', '2_000_000']
  rank ............................................ 0
  remote_device ................................... none
  reset_attention_mask ............................ False
  reset_position_ids .............................. False
  retriever_report_topk_accuracies ................ []
  retriever_score_scaling ......................... False
  retriever_seq_length ............................ 256
  sample_rate ..................................... 1.0
  save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
  save_interval ................................... 1500
  scatter_gather_tensors_in_pipeline .............. True
  scattered_embeddings ............................ False
  seed ............................................ 1234
  seq_length ...................................... 2048
  sgd_momentum .................................... 0.9
  short_seq_prob .................................. 0.1
  split ........................................... 949,50,1
  split_transformers .............................. False
  synchronize_each_layer .......................... False
  tensor_model_parallel_size ...................... 4
  tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/tr6-1B3-prefix-lm-unbiased-loss-logs/tensorboard
  tensorboard_log_interval ........................ 1
  tensorboard_queue_size .......................... 5
  tile_factor ..................................... 1
  titles_data_path ................................ None
  tokenizer_name_or_path .......................... None
  tokenizer_type .................................. GPT2BPETokenizer
  train_iters ..................................... None
  train_samples ................................... 73242187
  train_tokens .................................... None
  use_bnb_optimizer ............................... False
  use_checkpoint_lr_scheduler ..................... False
  use_contiguous_buffers_in_ddp ................... False
  use_cpu_initialization .......................... None
  use_one_sent_docs ............................... False
  use_pin_memory .................................. False
  virtual_pipeline_model_parallel_size ............ None
  vocab_extra_ids ................................. 0
  vocab_file ...................................... /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/data/gpt2-vocab.json
  weight_decay .................................... 0.1
  world_size ...................................... 64
  zero_allgather_bucket_size ...................... 0.0
  zero_contigious_gradients ....................... False
  zero_reduce_bucket_size ......................... 0.0
  zero_reduce_scatter ............................. False
  zero_stage ...................................... 1
-------------------- end of arguments ---------------------
will use batch size rampup starting from global batch size 32 to global batch size 512 with batch size increments 32 over 2000000 samples.
> building GPT2BPETokenizer tokenizer ...
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
 > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688)
> initializing torch distributed ...
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report

--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja


----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja   .................................... ..................  .................. [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m 


[92m[OKAY][0m----------------------------------------------------------------------------------------------------
--------------------------------------------------


--------------------------------------------------op nameop nameop name 
  ................op name ................................  ................installed  installed installedinstalled  ....   ..compatible..compatible
  --------------------------------------------------
compatible
compatible--------------------------------------------------


----------------------------------------------------------------------------------------------------

cpu_adam ............... [92m[YES][0mcpu_adam  ......cpu_adam............... cpu_adam  [92m[OKAY][0m ..............................[92m[YES][0m
   ......[92m[YES][0m [92m[YES][0m [92m[OKAY][0m ......
......  fused_adam[92m[OKAY][0m [92m[OKAY][0m
.............
 [92m[YES][0m fused_adam......  [92m[OKAY][0m.............
 [92m[YES][0m fused_adam......fused_adam  fused_lamb ............. ..........................[92m[OKAY][0m 
  [92m[YES][0m[92m[YES][0m[92m[YES][0m  ............ fused_lamb   ...................[92m[OKAY][0m[92m[OKAY][0m  

[92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0mfused_lamb
 fused_lamb.............  [92m[YES][0m.............  ......[92m[YES][0msparse_attn   [92m[OKAY][0m.................. 
 [92m[OKAY][0msparse_attn[93m[NO][0m
  ...................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0mtransformer
 ............sparse_attn transformer sparse_attn[92m[YES][0m ............   ............[93m[NO][0m..................    [92m[OKAY][0m[92m[YES][0m.......
  [93m[NO][0m......[92m[OKAY][0m  
.......[92m[OKAY][0mstochastic_transformer 
 transformer[92m[OKAY][0m. 
 ............[92m[YES][0mstochastic_transformertransformer    [92m[YES][0m...................    [92m[OKAY][0m......[92m[YES][0m
[92m[YES][0m   ......[92m[OKAY][0m...... 
[92m[OKAY][0m 
[92m[OKAY][0mstochastic_transformer
 . [92m[YES][0mstochastic_transformer  .......  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.

[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0masync_io
 ............... [93m[NO][0m .......utils  [93m[NO][0m..................
 [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. transformer_inference[93m[NO][0m  .........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
torch version .................... 1.8.2
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']
deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install pathDeepSpeed general environment info: ............... 
torch install path['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']
 ............... torch version .................... 1.8.2
['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']torch cuda version
 ............... torch version11.1 
....................nvcc version  1.8.2.....................
 11.2torch cuda version
 deepspeed install path...............  11.1...........
 nvcc version['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] .....................
 deepspeed info11.2 
...................deepspeed install path  0.5.5+58a8e13, 58a8e13, master...........
 deepspeed wheel compiled w. ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']......
 deepspeed infotorch 1.8, cuda 11.1 
................... 0.5.5+58a8e13, 58a8e13, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
**** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****
> initializing tensor model parallel with size 4
> initializing pipeline model parallel with size 4
> setting random seeds to 1234 ...
[2021-11-04 05:38:40,414] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
> compiling dataset index builder ...
make: Entering directory '/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/data'
make: Nothing to be done for 'default'.
make: Leaving directory '/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/data'
>>> done with dataset index builder. Compilation time: 0.328 seconds
> compiling and loading fused kernels ...
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_upper_triang_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_upper_triang_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module fused_mix_prec_layer_norm_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_mix_prec_layer_norm_cuda...
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
>>> done with compiling and loading fused kernels. Compilation time: 50.903 seconds
time to initialize megatron (seconds): 136.596
[after megatron is initialized] datetime: 2021-11-04 05:39:36 
building GPT model ...
[2021-11-04 05:39:36,647] [INFO] [utils.py:806:see_memory_usage] Before Building Model
[2021-11-04 05:39:36,648] [INFO] [utils.py:807:see_memory_usage] MA 0.0 GB         Max_MA 0.0 GB         CA 0.0 GB         Max_CA 0 GB 
[2021-11-04 05:39:36,648] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory:  used = 41.41 GB, percent = 22.1%
SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None
Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=1, data=0, model=0): 16, ProcessCoord(pipe=1, data=0, model=1): 17, ProcessCoord(pipe=1, data=0, model=2): 18, ProcessCoord(pipe=1, data=0, model=3): 19, ProcessCoord(pipe=1, data=1, model=0): 20, ProcessCoord(pipe=1, data=1, model=1): 21, ProcessCoord(pipe=1, data=1, model=2): 22, ProcessCoord(pipe=1, data=1, model=3): 23, ProcessCoord(pipe=1, data=2, model=0): 24, ProcessCoord(pipe=1, data=2, model=1): 25, ProcessCoord(pipe=1, data=2, model=2): 26, ProcessCoord(pipe=1, data=2, model=3): 27, ProcessCoord(pipe=1, data=3, model=0): 28, ProcessCoord(pipe=1, data=3, model=1): 29, ProcessCoord(pipe=1, data=3, model=2): 30, ProcessCoord(pipe=1, data=3, model=3): 31, ProcessCoord(pipe=2, data=0, model=0): 32, ProcessCoord(pipe=2, data=0, model=1): 33, ProcessCoord(pipe=2, data=0, model=2): 34, ProcessCoord(pipe=2, data=0, model=3): 35, ProcessCoord(pipe=2, data=1, model=0): 36, ProcessCoord(pipe=2, data=1, model=1): 37, ProcessCoord(pipe=2, data=1, model=2): 38, ProcessCoord(pipe=2, data=1, model=3): 39, ProcessCoord(pipe=2, data=2, model=0): 40, ProcessCoord(pipe=2, data=2, model=1): 41, ProcessCoord(pipe=2, data=2, model=2): 42, ProcessCoord(pipe=2, data=2, model=3): 43, ProcessCoord(pipe=2, data=3, model=0): 44, ProcessCoord(pipe=2, data=3, model=1): 45, ProcessCoord(pipe=2, data=3, model=2): 46, ProcessCoord(pipe=2, data=3, model=3): 47, ProcessCoord(pipe=3, data=0, model=0): 48, ProcessCoord(pipe=3, data=0, model=1): 49, ProcessCoord(pipe=3, data=0, model=2): 50, ProcessCoord(pipe=3, data=0, model=3): 51, ProcessCoord(pipe=3, data=1, model=0): 52, ProcessCoord(pipe=3, data=1, model=1): 53, ProcessCoord(pipe=3, data=1, model=2): 54, ProcessCoord(pipe=3, data=1, model=3): 55, ProcessCoord(pipe=3, data=2, model=0): 56, ProcessCoord(pipe=3, data=2, model=1): 57, ProcessCoord(pipe=3, data=2, model=2): 58, ProcessCoord(pipe=3, data=2, model=3): 59, ProcessCoord(pipe=3, data=3, model=0): 60, ProcessCoord(pipe=3, data=3, model=1): 61, ProcessCoord(pipe=3, data=3, model=2): 62, ProcessCoord(pipe=3, data=3, model=3): 63}
[2021-11-04 05:39:37,171] [INFO] [module.py:365:_partition_layers] Partitioning pipeline stages with method type:transformer
stage=0 layers=9
     0: _to_float16
     1: EmbeddingPipe
     2: <lambda>
     3: ParallelTransformerLayerPipe
     4: ParallelTransformerLayerPipe
     5: ParallelTransformerLayerPipe
     6: ParallelTransformerLayerPipe
     7: ParallelTransformerLayerPipe
     8: ParallelTransformerLayerPipe
stage=1 layers=6
     9: ParallelTransformerLayerPipe
    10: ParallelTransformerLayerPipe
    11: ParallelTransformerLayerPipe
    12: ParallelTransformerLayerPipe
    13: ParallelTransformerLayerPipe
    14: ParallelTransformerLayerPipe
stage=2 layers=6
    15: ParallelTransformerLayerPipe
    16: ParallelTransformerLayerPipe
    17: ParallelTransformerLayerPipe
    18: ParallelTransformerLayerPipe
    19: ParallelTransformerLayerPipe
    20: ParallelTransformerLayerPipe
stage=3 layers=11
    21: ParallelTransformerLayerPipe
    22: ParallelTransformerLayerPipe
    23: ParallelTransformerLayerPipe
    24: ParallelTransformerLayerPipe
    25: ParallelTransformerLayerPipe
    26: ParallelTransformerLayerPipe
    27: <lambda>
    28: <lambda>
    29: MixedFusedLayerNorm
    30: EmbeddingPipe
    31: float16_to_fp32
  loss: CrossEntropy
 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 105739264
 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 105743360
 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 105743360
 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 105743360
 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 105743360
 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 105739264 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 105739264

[2021-11-04 05:39:37,563] [INFO] [utils.py:806:see_memory_usage] After Building Model
[2021-11-04 05:39:37,564] [INFO] [utils.py:807:see_memory_usage] MA 0.21 GB         Max_MA 0.21 GB         CA 0.22 GB         Max_CA 0 GB 
[2021-11-04 05:39:37,564] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory:  used = 41.65 GB, percent = 22.2%
 > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 105739264
setting training iterations to 152972
> learning rate decay style: cosine
DeepSpeed is enabled.
[2021-11-04 05:39:37,583] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.5.5+58a8e13, git-hash=58a8e13, git-branch=master
[2021-11-04 05:39:37,653] [INFO] [engine.py:207:__init__] DeepSpeed Flops Profiler Enabled: False
[2021-11-04 05:39:37,653] [INFO] [engine.py:862:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer
[2021-11-04 05:39:37,653] [INFO] [engine.py:868:_configure_optimizer] Using client Optimizer as basic optimizer
[2021-11-04 05:39:37,654] [INFO] [engine.py:884:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam
[2021-11-04 05:39:37,654] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type=<class 'apex.optimizers.fused_adam.FusedAdam'>
[2021-11-04 05:39:37,654] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer
[2021-11-04 05:39:37,654] [INFO] [stage2.py:111:__init__] Reduce bucket size 500000000
[2021-11-04 05:39:37,654] [INFO] [stage2.py:112:__init__] Allgather bucket size 500000000
[2021-11-04 05:39:37,654] [INFO] [stage2.py:113:__init__] CPU Offload: False
[2021-11-04 05:39:37,654] [INFO] [stage2.py:114:__init__] Round robin gradient partitioning: False
Rank: 22 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 26 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 25 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 21 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 29 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 17 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 30 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 18 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 23 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 27 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 19 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 31 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 45 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 33 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 28 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 40 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 16 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 20 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 34 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 38 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 35 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 44 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 41 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 39 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 43 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 42 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 37 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 24 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 32 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 47 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 46 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 36 partition count [4, 4] and sizes[(18874368, False), (23808, False)] 
Rank: 11 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 6 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 60 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 62 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 1 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 3 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 58 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 54 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 50 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 49 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 51 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 48 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 9 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 61 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 13 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 15 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 52 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 7 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 5 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 

Rank: 10 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 53 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 14 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 56 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 2 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 55 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 59 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 57 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 63 partition count [4, 4] and sizes[(26411008, False), (24832, False)] 
Rank: 0 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 8 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 12 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
Rank: 4 partition count [4, 4] and sizes[(26411008, False), (23808, False)] 
[2021-11-04 05:39:37,967] [INFO] [utils.py:806:see_memory_usage] Before initializing optimizer states
[2021-11-04 05:39:37,967] [INFO] [utils.py:807:see_memory_usage] MA 0.3 GB         Max_MA 0.35 GB         CA 0.59 GB         Max_CA 1 GB 
[2021-11-04 05:39:37,968] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory:  used = 42.46 GB, percent = 22.7%
[2021-11-04 05:39:37,994] [INFO] [utils.py:806:see_memory_usage] After initializing optimizer states
[2021-11-04 05:39:37,994] [INFO] [utils.py:807:see_memory_usage] MA 0.49 GB         Max_MA 0.59 GB         CA 0.89 GB         Max_CA 1 GB 
[2021-11-04 05:39:37,995] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory:  used = 42.46 GB, percent = 22.7%
[2021-11-04 05:39:37,995] [INFO] [stage2.py:474:__init__] optimizer state initialized
[2021-11-04 05:39:38,018] [INFO] [utils.py:806:see_memory_usage] After initializing ZeRO optimizer
[2021-11-04 05:39:38,019] [INFO] [utils.py:807:see_memory_usage] MA 0.49 GB         Max_MA 0.49 GB         CA 0.89 GB         Max_CA 1 GB 
[2021-11-04 05:39:38,019] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory:  used = 42.46 GB, percent = 22.7%
[2021-11-04 05:39:38,019] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam
[2021-11-04 05:39:38,019] [INFO] [engine.py:599:_configure_lr_scheduler] DeepSpeed using client LR scheduler
[2021-11-04 05:39:38,019] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = <megatron.learning_rates.AnnealingLR object at 0x14b595615c10>
[2021-11-04 05:39:38,019] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)]
[2021-11-04 05:39:38,019] [INFO] [config.py:940:print] DeepSpeedEngine configuration:
[2021-11-04 05:39:38,019] [INFO] [config.py:944:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2021-11-04 05:39:38,019] [INFO] [config.py:944:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2021-11-04 05:39:38,019] [INFO] [config.py:944:print]   allreduce_always_fp32 ........ False
[2021-11-04 05:39:38,019] [INFO] [config.py:944:print]   amp_enabled .................. False
[2021-11-04 05:39:38,019] [INFO] [config.py:944:print]   amp_params ................... False
[2021-11-04 05:39:38,019] [INFO] [config.py:944:print]   checkpoint_tag_validation_enabled  True
[2021-11-04 05:39:38,019] [INFO] [config.py:944:print]   checkpoint_tag_validation_fail  False
[2021-11-04 05:39:38,019] [INFO] [config.py:944:print]   curriculum_enabled ........... False
[2021-11-04 05:39:38,020] [INFO] [config.py:944:print]   curriculum_params ............ False
[2021-11-04 05:39:38,020] [INFO] [config.py:944:print]   dataloader_drop_last ......... False
[2021-11-04 05:39:38,020] [INFO] [config.py:944:print]   disable_allgather ............ False
[2021-11-04 05:39:38,020] [INFO] [config.py:944:print]   dump_state ................... False
[2021-11-04 05:39:38,020] [INFO] [config.py:944:print]   dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1}
[2021-11-04 05:39:38,020] [INFO] [config.py:944:print]   eigenvalue_enabled ........... False
[2021-11-04 05:39:38,020] [INFO] [config.py:944:print]   eigenvalue_gas_boundary_resolution  1
[2021-11-04 05:39:38,020] [INFO] [config.py:944:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2021-11-04 05:39:38,020] [INFO] [config.py:944:print]   eigenvalue_layer_num ......... 0
[2021-11-04 05:39:38,020] [INFO] [config.py:944:print]   eigenvalue_max_iter .......... 100
[2021-11-04 05:39:38,020] [INFO] [config.py:944:print]   eigenvalue_stability ......... 1e-06
[2021-11-04 05:39:38,020] [INFO] [config.py:944:print]   eigenvalue_tol ............... 0.01
[2021-11-04 05:39:38,020] [INFO] [config.py:944:print]   eigenvalue_verbose ........... False
[2021-11-04 05:39:38,020] [INFO] [config.py:944:print]   elasticity_enabled ........... False
[2021-11-04 05:39:38,020] [INFO] [config.py:944:print]   flops_profiler_config ........ {
    "enabled": false, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2021-11-04 05:39:38,020] [INFO] [config.py:944:print]   fp16_enabled ................. True
[2021-11-04 05:39:38,020] [INFO] [config.py:944:print]   fp16_master_weights_and_gradients  False
[2021-11-04 05:39:38,020] [INFO] [config.py:944:print]   fp16_mixed_quantize .......... False
[2021-11-04 05:39:38,020] [INFO] [config.py:944:print]   global_rank .................. 0
[2021-11-04 05:39:38,020] [INFO] [config.py:944:print]   gradient_accumulation_steps .. 16
[2021-11-04 05:39:38,020] [INFO] [config.py:944:print]   gradient_clipping ............ 1.0
[2021-11-04 05:39:38,020] [INFO] [config.py:944:print]   gradient_predivide_factor .... 1.0
[2021-11-04 05:39:38,020] [INFO] [config.py:944:print]   initial_dynamic_scale ........ 4096
[2021-11-04 05:39:38,020] [INFO] [config.py:944:print]   loss_scale ................... 0
[2021-11-04 05:39:38,020] [INFO] [config.py:944:print]   memory_breakdown ............. False
[2021-11-04 05:39:38,020] [INFO] [config.py:944:print]   optimizer_legacy_fusion ...... False
[2021-11-04 05:39:38,020] [INFO] [config.py:944:print]   optimizer_name ............... None
[2021-11-04 05:39:38,020] [INFO] [config.py:944:print]   optimizer_params ............. None
[2021-11-04 05:39:38,020] [INFO] [config.py:944:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2021-11-04 05:39:38,020] [INFO] [config.py:944:print]   pld_enabled .................. False
[2021-11-04 05:39:38,020] [INFO] [config.py:944:print]   pld_params ................... False
[2021-11-04 05:39:38,020] [INFO] [config.py:944:print]   prescale_gradients ........... False
[2021-11-04 05:39:38,020] [INFO] [config.py:944:print]   quantize_change_rate ......... 0.001
[2021-11-04 05:39:38,020] [INFO] [config.py:944:print]   quantize_groups .............. 1
[2021-11-04 05:39:38,020] [INFO] [config.py:944:print]   quantize_offset .............. 1000
[2021-11-04 05:39:38,020] [INFO] [config.py:944:print]   quantize_period .............. 1000
[2021-11-04 05:39:38,020] [INFO] [config.py:944:print]   quantize_rounding ............ 0
[2021-11-04 05:39:38,021] [INFO] [config.py:944:print]   quantize_start_bits .......... 16
[2021-11-04 05:39:38,021] [INFO] [config.py:944:print]   quantize_target_bits ......... 8
[2021-11-04 05:39:38,021] [INFO] [config.py:944:print]   quantize_training_enabled .... False
[2021-11-04 05:39:38,021] [INFO] [config.py:944:print]   quantize_type ................ 0
[2021-11-04 05:39:38,021] [INFO] [config.py:944:print]   quantize_verbose ............. False
[2021-11-04 05:39:38,021] [INFO] [config.py:944:print]   scheduler_name ............... None
[2021-11-04 05:39:38,021] [INFO] [config.py:944:print]   scheduler_params ............. None
[2021-11-04 05:39:38,021] [INFO] [config.py:944:print]   sparse_attention ............. None
[2021-11-04 05:39:38,021] [INFO] [config.py:944:print]   sparse_gradients_enabled ..... False
[2021-11-04 05:39:38,021] [INFO] [config.py:944:print]   steps_per_print .............. 2000
[2021-11-04 05:39:38,021] [INFO] [config.py:944:print]   tensorboard_enabled .......... False
[2021-11-04 05:39:38,021] [INFO] [config.py:944:print]   tensorboard_job_name ......... DeepSpeedJobName
[2021-11-04 05:39:38,021] [INFO] [config.py:944:print]   tensorboard_output_path ...... 
[2021-11-04 05:39:38,021] [INFO] [config.py:944:print]   train_batch_size ............. 512
[2021-11-04 05:39:38,021] [INFO] [config.py:944:print]   train_micro_batch_size_per_gpu  8
[2021-11-04 05:39:38,021] [INFO] [config.py:944:print]   use_quantizer_kernel ......... False
[2021-11-04 05:39:38,021] [INFO] [config.py:944:print]   wall_clock_breakdown ......... False
[2021-11-04 05:39:38,021] [INFO] [config.py:944:print]   world_size ................... 4
[2021-11-04 05:39:38,021] [INFO] [config.py:944:print]   zero_allow_untested_optimizer  False
[2021-11-04 05:39:38,021] [INFO] [config.py:944:print]   zero_config .................. {
    "stage": 1, 
    "contiguous_gradients": true, 
    "reduce_scatter": true, 
    "reduce_bucket_size": 5.000000e+08, 
    "allgather_partitions": true, 
    "allgather_bucket_size": 5.000000e+08, 
    "overlap_comm": false, 
    "load_from_fp32_weights": true, 
    "elastic_checkpoint": true, 
    "offload_param": null, 
    "offload_optimizer": null, 
    "sub_group_size": 1.000000e+09, 
    "prefetch_bucket_size": 5.000000e+07, 
    "param_persistence_threshold": 1.000000e+05, 
    "max_live_parameters": 1.000000e+09, 
    "max_reuse_distance": 1.000000e+09, 
    "gather_fp16_weights_on_model_save": false, 
    "ignore_unused_parameters": true, 
    "round_robin_gradients": false, 
    "legacy_stage1": false
}
[2021-11-04 05:39:38,021] [INFO] [config.py:944:print]   zero_enabled ................. True
[2021-11-04 05:39:38,021] [INFO] [config.py:944:print]   zero_optimization_stage ...... 1
[2021-11-04 05:39:38,021] [INFO] [config.py:946:print]   json = {
    "train_micro_batch_size_per_gpu": 8, 
    "train_batch_size": 512, 
    "gradient_clipping": 1.0, 
    "zero_optimization": {
        "stage": 1
    }, 
    "fp16": {
        "enabled": true, 
        "loss_scale": 0, 
        "loss_scale_window": 500, 
        "hysteresis": 2, 
        "min_loss_scale": 1, 
        "initial_scale_power": 12
    }, 
    "steps_per_print": 2.000000e+03, 
    "wall_clock_breakdown": false
}
[2021-11-04 05:39:38,021] [INFO] [engine.py:93:__init__] CONFIG: micro_batches=16 micro_batch_size=8
[2021-11-04 05:39:38,312] [INFO] [engine.py:151:__init__] RANK=0 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=105739264 (105.739M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-04 05:39:38,312] [INFO] [engine.py:151:__init__] RANK=1 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=105739264 (105.739M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-04 05:39:38,312] [INFO] [engine.py:151:__init__] RANK=33 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-04 05:39:38,312] [INFO] [engine.py:151:__init__] RANK=35 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-04 05:39:38,312] [INFO] [engine.py:151:__init__] RANK=34 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-04 05:39:38,312] [INFO] [engine.py:151:__init__] RANK=32 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-04 05:39:38,312] [INFO] [engine.py:151:__init__] RANK=2 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=105739264 (105.739M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-04 05:39:38,312] [INFO] [engine.py:151:__init__] RANK=3 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=105739264 (105.739M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-04 05:39:38,312] [INFO] [engine.py:151:__init__] RANK=17 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-04 05:39:38,312] [INFO] [engine.py:151:__init__] RANK=19 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-04 05:39:38,312] [INFO] [engine.py:151:__init__] RANK=16 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-04 05:39:38,312] [INFO] [engine.py:151:__init__] RANK=18 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-04 05:39:38,312] [INFO] [engine.py:151:__init__] RANK=48 STAGE=3 LAYERS=11 [21, 32) STAGE_PARAMS=105743360 (105.743M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-04 05:39:38,312] [INFO] [engine.py:151:__init__] RANK=49 STAGE=3 LAYERS=11 [21, 32) STAGE_PARAMS=105743360 (105.743M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-04 05:39:38,312] [INFO] [engine.py:151:__init__] RANK=50 STAGE=3 LAYERS=11 [21, 32) STAGE_PARAMS=105743360 (105.743M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
[2021-11-04 05:39:38,312] [INFO] [engine.py:151:__init__] RANK=51 STAGE=3 LAYERS=11 [21, 32) STAGE_PARAMS=105743360 (105.743M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M)
 > using checkpoint value 0.0002 for learning rate
 > using checkpoint value 1e-05 for minimum learning rate
 > using checkpoint value 183105 for warmup iterations
 > using checkpoint value 73242187 for total number of iterations
 > using checkpoint value cosine for decay style
successfully loaded 4 ZeRO state_dicts for rank 35
loading 4 zero partition checkpoints for rank 35
successfully loaded 4 ZeRO state_dicts for rank 33
successfully loaded 4 ZeRO state_dicts for rank 28
successfully loaded 4 ZeRO state_dicts for rank 43
successfully loaded 4 ZeRO state_dicts for rank 34
successfully loaded 4 ZeRO state_dicts for rank 20
loading 4 zero partition checkpoints for rank 33
loading 4 zero partition checkpoints for rank 28
loading 4 zero partition checkpoints for rank 43
loading 4 zero partition checkpoints for rank 34
loading 4 zero partition checkpoints for rank 20
successfully loaded 4 ZeRO state_dicts for rank 47
successfully loaded 4 ZeRO state_dicts for rank 59
loading 4 zero partition checkpoints for rank 47
successfully loaded 4 ZeRO state_dicts for rank 46
loading 4 zero partition checkpoints for rank 59
loading 4 zero partition checkpoints for rank 46
successfully loaded 4 ZeRO state_dicts for rank 36
successfully loaded 4 ZeRO state_dicts for rank 37
successfully loaded 4 ZeRO state_dicts for rank 18
successfully loaded 4 ZeRO state_dicts for rank 26
loading 4 zero partition checkpoints for rank 36
loading 4 zero partition checkpoints for rank 37
successfully loaded 4 ZeRO state_dicts for rank 45
loading 4 zero partition checkpoints for rank 18
successfully loaded 4 ZeRO state_dicts for rank 19
loading 4 zero partition checkpoints for rank 26
successfully loaded 4 ZeRO state_dicts for rank 23
successfully loaded 4 ZeRO state_dicts for rank 22
successfully loaded 4 ZeRO state_dicts for rank 39
loading 4 zero partition checkpoints for rank 45
loading 4 zero partition checkpoints for rank 19
loading 4 zero partition checkpoints for rank 23
loading 4 zero partition checkpoints for rank 22
loading 4 zero partition checkpoints for rank 39
successfully loaded 4 ZeRO state_dicts for rank 40
successfully loaded 4 ZeRO state_dicts for rank 42
successfully loaded 4 ZeRO state_dicts for rank 41
loading 4 zero partition checkpoints for rank 40
successfully loaded 4 ZeRO state_dicts for rank 8
successfully loaded 4 ZeRO state_dicts for rank 32
loading 4 zero partition checkpoints for rank 42
successfully loaded 4 ZeRO state_dicts for rank 44
successfully loaded 4 ZeRO state_dicts for rank 0
successfully loaded 4 ZeRO state_dicts for rank 48
loading 4 zero partition checkpoints for rank 41
successfully loaded 4 ZeRO state_dicts for rank 24
successfully loaded 4 ZeRO state_dicts for rank 16
successfully loaded 4 ZeRO state_dicts for rank 27
loading 4 zero partition checkpoints for rank 32
loading 4 zero partition checkpoints for rank 8
loading 4 zero partition checkpoints for rank 44
loading 4 zero partition checkpoints for rank 24
loading 4 zero partition checkpoints for rank 0
 checkpoint version 3.0
loading 4 zero partition checkpoints for rank 16
successfully loaded 4 ZeRO state_dicts for rank 38
loading 4 zero partition checkpoints for rank 48
loading 4 zero partition checkpoints for rank 27
successfully loaded 4 ZeRO state_dicts for rank 57
loading 4 zero partition checkpoints for rank 38
loading 4 zero partition checkpoints for rank 57
successfully loaded 4 ZeRO state_dicts for rank 31
successfully loaded 4 ZeRO state_dicts for rank 12
successfully loaded 4 ZeRO state_dicts for rank 49
successfully loaded 4 ZeRO state_dicts for rank 56
successfully loaded 4 ZeRO state_dicts for rank 4
successfully loaded 4 ZeRO state_dicts for rank 29
loading 4 zero partition checkpoints for rank 31
successfully loaded 4 ZeRO state_dicts for rank 60
loading 4 zero partition checkpoints for rank 12
loading 4 zero partition checkpoints for rank 49
successfully loaded 4 ZeRO state_dicts for rank 15
loading 4 zero partition checkpoints for rank 56
loading 4 zero partition checkpoints for rank 29
loading 4 zero partition checkpoints for rank 4
successfully loaded 4 ZeRO state_dicts for rank 6
successfully loaded 4 ZeRO state_dicts for rank 52
loading 4 zero partition checkpoints for rank 60
successfully loaded 4 ZeRO state_dicts for rank 5
loading 4 zero partition checkpoints for rank 15
successfully loaded 4 ZeRO state_dicts for rank 30
successfully loaded 4 ZeRO state_dicts for rank 25
loading 4 zero partition checkpoints for rank 6
loading 4 zero partition checkpoints for rank 30
loading 4 zero partition checkpoints for rank 52
loading 4 zero partition checkpoints for rank 5
successfully loaded 4 ZeRO state_dicts for rank 11
loading 4 zero partition checkpoints for rank 25
successfully loaded 4 ZeRO state_dicts for rank 61
successfully loaded 4 ZeRO state_dicts for rank 53
successfully loaded 4 ZeRO state_dicts for rank 13
loading 4 zero partition checkpoints for rank 11
successfully loaded 4 ZeRO state_dicts for rank 62
loading 4 zero partition checkpoints for rank 61
loading 4 zero partition checkpoints for rank 53
loading 4 zero partition checkpoints for rank 13
successfully loaded 4 ZeRO state_dicts for rank 2
loading 4 zero partition checkpoints for rank 62
successfully loaded 4 ZeRO state_dicts for rank 17
successfully loaded 4 ZeRO state_dicts for rank 51
successfully loaded 4 ZeRO state_dicts for rank 58
successfully loaded 4 ZeRO state_dicts for rank 21
successfully loaded 4 ZeRO state_dicts for rank 14
loading 4 zero partition checkpoints for rank 2
successfully loaded 4 ZeRO state_dicts for rank 10
loading 4 zero partition checkpoints for rank 17
loading 4 zero partition checkpoints for rank 21
loading 4 zero partition checkpoints for rank 51
loading 4 zero partition checkpoints for rank 58
loading 4 zero partition checkpoints for rank 14
loading 4 zero partition checkpoints for rank 10
successfully loaded 4 ZeRO state_dicts for rank 55
successfully loaded 4 ZeRO state_dicts for rank 9
successfully loaded 4 ZeRO state_dicts for rank 1
loading 4 zero partition checkpoints for rank 55
loading 4 zero partition checkpoints for rank 9
successfully loaded 4 ZeRO state_dicts for rank 54
successfully loaded 4 ZeRO state_dicts for rank 50
loading 4 zero partition checkpoints for rank 1
successfully loaded 4 ZeRO state_dicts for rank 63
successfully loaded 4 ZeRO state_dicts for rank 3
loading 4 zero partition checkpoints for rank 54
loading 4 zero partition checkpoints for rank 50
loading 4 zero partition checkpoints for rank 63
loading 4 zero partition checkpoints for rank 3
successfully loaded 4 ZeRO state_dicts for rank 7
loading 4 zero partition checkpoints for rank 7
  successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints at iteration 30807
time (ms) | load-checkpoint: 15796.49
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264estimated model parameters: 1.209483264

estimated model parameters: 1.209483264estimated model parameters: 1.209483264

estimated model parameters: 1.209483264estimated model parameters: 1.209483264

estimated model parameters: 1.209483264estimated model parameters: 1.209483264

estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.691828224
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.691828224
estimated model parameters: 1.209483264estimated model parameters: 1.209483264
estimated model parameters: 1.209483264

/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.691828224
estimated model parameters: 1.691828224
estimated model parameters: 1.691828224
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.691828224
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.691828224
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.691828224
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264


/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264estimated model parameters: 1.209483264

/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.69189376
estimated model parameters: 1.69189376
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.69189376
estimated model parameters: 1.69189376
estimated model parameters: 1.69189376
estimated model parameters: 1.691828224
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.69189376estimated model parameters: 1.69189376
estimated model parameters: 1.69189376
estimated model parameters: 1.69189376
estimated model parameters: 1.691828224
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264

estimated model parameters: 1.69189376
estimated model parameters: 1.69189376
estimated model parameters: 1.691828224
estimated model parameters: 1.691828224
estimated model parameters: 1.69189376
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.69189376
estimated model parameters: 1.69189376
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.691828224estimated model parameters: 1.691828224estimated model parameters: 1.691828224


estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.691828224
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264estimated model parameters: 1.209483264

estimated model parameters: 1.69189376estimated model parameters: 1.69189376

estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
[after model, optimizer, and learning rate scheduler are built] datetime: 2021-11-04 05:39:54 
> building train, validation, and test datasets ...
 > datasets target sizes (minimum size):
    train:      73242187
    validation: 7833600
    test:       51200
> building train, validation, and test datasets for GPT ...
 > building dataset index ...
    reading sizes...
    reading pointers...
    reading document index...
    creating numpy buffer of mmap...
    creating memory view of numpy buffer...
 > finished creating indexed dataset in 4.238386 seconds
    number of documents: 304230423
 > dataset split:
    train:
     document indices in [0, 288714672) total of 288714672 documents
    validation:
     document indices in [288714672, 303926193) total of 15211521 documents
    test:
     document indices in [303926193, 304230423) total of 304230 documents
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.236 seconds
    total number of samples: 131537224
    total number of epochs: 1
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.278 seconds
    total number of samples: 13854322
    total number of epochs: 2
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.078 seconds
    total number of samples: 137384
    total number of epochs: 1
> finished creating GPT datasets ...
Number of parameters: 1.691828224 billionNumber of parameters: 1.691828224 billion

Number of parameters: 1.691828224 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
time (ms) | model-and-optimizer-setup: 17582.97 | train/valid/test-data-iterators-setup: 9691.56
[after dataloaders are built] datetime: 2021-11-04 05:40:06 
done with setup ...
training ...
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.69189376 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.69189376 billion
Number of parameters: 1.69189376 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.69189376 billionNumber of parameters without embeddings: 1.2095488 billion

Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters: 1.691828224 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
[before the start of training step] datetime: 2021-11-04 05:40:06 
[2021-11-04 05:40:06,719] [INFO] [checkpointing.py:547:forward] Activation Checkpointing Information
[2021-11-04 05:40:06,720] [INFO] [checkpointing.py:548:forward] ----Partition Activations False, CPU CHECKPOINTING False
[2021-11-04 05:40:06,720] [INFO] [checkpointing.py:551:forward] ----contiguous Memory Checkpointing False with 24 total layers
[2021-11-04 05:40:06,720] [INFO] [checkpointing.py:554:forward] ----Synchronization False
[2021-11-04 05:40:06,720] [INFO] [checkpointing.py:555:forward] ----Profiling time in checkpointing False
[Rank 35] (after 31000 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 3000.8916015625 | reserved: 4454.0 | max reserved: 4454.0
[Rank 3] (after 31000 iterations) memory (MB) | allocated: 681.95849609375 | max allocated: 4309.45458984375 | reserved: 5760.0 | max reserved: 5760.0
[Rank 19] (after 31000 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 3432.89208984375 | reserved: 4918.0 | max reserved: 4918.0
[Rank 51] (after 31000 iterations) memory (MB) | allocated: 2393.11083984375 | max allocated: 4380.0517578125 | reserved: 6912.0 | max reserved: 6912.0
[Rank 2] (after 31000 iterations) memory (MB) | allocated: 681.95849609375 | max allocated: 4309.45458984375 | reserved: 5712.0 | max reserved: 5712.0
[Rank 18] (after 31000 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 3432.89208984375 | reserved: 4918.0 | max reserved: 4918.0
[Rank 50] (after 31000 iterations) memory (MB) | allocated: 2393.11083984375 | max allocated: 4379.5517578125 | reserved: 6848.0 | max reserved: 6848.0
[Rank 34] (after 31000 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 3000.8916015625 | reserved: 4326.0 | max reserved: 4326.0
 iteration    31000/  152972 | consumed samples:     10792384 | consumed tokens:  22102802432 | elapsed time per iteration (ms): 6251.2 | learning rate: 1.903E-04 | global batch size:   512 | lm loss: 2.069374E+00 | loss scale: 524288.0 | grad norm: 38134.709 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
[Rank 0] (after 31000 iterations) memory (MB) | allocated: 681.95849609375 | max allocated: 4309.45458984375 | reserved: 5760.0 | max reserved: 5760.0
[Rank 16] (after 31000 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 3432.89208984375 | reserved: 4774.0 | max reserved: 4774.0
[Rank 32] (after 31000 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 3000.8916015625 | reserved: 4342.0 | max reserved: 4342.0
[Rank 48] (after 31000 iterations) memory (MB) | allocated: 2393.11083984375 | max allocated: 4379.5517578125 | reserved: 7464.0 | max reserved: 7464.0
[Rank 33] (after 31000 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 3000.8916015625 | reserved: 4342.0 | max reserved: 4342.0
[Rank 17] (after 31000 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 3432.89208984375 | reserved: 4774.0 | max reserved: 4774.0
[Rank 49] (after 31000 iterations) memory (MB) | allocated: 2393.11083984375 | max allocated: 4379.5517578125 | reserved: 7464.0 | max reserved: 7464.0
[Rank 1] (after 31000 iterations) memory (MB) | allocated: 681.95849609375 | max allocated: 4309.45458984375 | reserved: 5760.0 | max reserved: 5760.0
-------------------------------------------------------------------------------------------------
 validation loss at iteration 31000 | lm loss value: 2.049838E+00 | lm loss PPL: 7.766640E+00 | 
-------------------------------------------------------------------------------------------------
 iteration    31200/  152972 | consumed samples:     10894784 | consumed tokens:  22312517632 | elapsed time per iteration (ms): 7079.7 | learning rate: 1.901E-04 | global batch size:   512 | lm loss: 2.067551E+00 | loss scale: 1048576.0 | grad norm: 75132.397 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    31400/  152972 | consumed samples:     10997184 | consumed tokens:  22522232832 | elapsed time per iteration (ms): 6104.1 | learning rate: 1.900E-04 | global batch size:   512 | lm loss: 2.066499E+00 | loss scale: 1048576.0 | grad norm: 94494.496 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
saving checkpoint at iteration   31500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
[2021-11-04 06:54:24,876] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/mp_rank_00_model_states.pt
[2021-11-04 06:54:24,884] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/mp_rank_01_model_states.pt
[2021-11-04 06:54:25,251] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_0_mp_rank_08_optim_states.pt
[2021-11-04 06:54:25,255] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_0_mp_rank_04_optim_states.pt
[2021-11-04 06:54:25,259] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_1_mp_rank_10_optim_states.pt
[2021-11-04 06:54:25,260] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_0_mp_rank_06_optim_states.pt
[2021-11-04 06:54:25,260] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_2_mp_rank_08_optim_states.pt
[2021-11-04 06:54:25,262] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_1_mp_rank_09_optim_states.pt
[2021-11-04 06:54:25,263] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_2_mp_rank_06_optim_states.pt
[2021-11-04 06:54:25,263] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_2_mp_rank_10_optim_states.pt
[2021-11-04 06:54:25,265] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_3_mp_rank_07_optim_states.pt
[2021-11-04 06:54:25,265] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_2_mp_rank_07_optim_states.pt
[2021-11-04 06:54:25,268] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_1_mp_rank_06_optim_states.pt
[2021-11-04 06:54:25,269] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_3_mp_rank_04_optim_states.pt
[2021-11-04 06:54:25,270] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_1_mp_rank_07_optim_states.pt
[2021-11-04 06:54:25,271] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_3_mp_rank_09_optim_states.pt
[2021-11-04 06:54:25,276] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_3_mp_rank_08_optim_states.pt
[2021-11-04 06:54:25,284] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_2_mp_rank_11_optim_states.pt
[2021-11-04 06:54:25,290] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_3_mp_rank_05_optim_states.pt
[2021-11-04 06:54:25,290] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_1_mp_rank_04_optim_states.pt
[2021-11-04 06:54:25,291] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_3_mp_rank_06_optim_states.pt
[2021-11-04 06:54:25,292] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_0_mp_rank_10_optim_states.pt
[2021-11-04 06:54:25,293] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_0_mp_rank_07_optim_states.pt
[2021-11-04 06:54:25,294] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_1_mp_rank_05_optim_states.pt
[2021-11-04 06:54:25,295] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_3_mp_rank_10_optim_states.pt
[2021-11-04 06:54:25,296] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_0_mp_rank_11_optim_states.pt
[2021-11-04 06:54:25,296] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_1_mp_rank_11_optim_states.pt
[2021-11-04 06:54:25,297] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_2_mp_rank_09_optim_states.pt
[2021-11-04 06:54:25,299] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_3_mp_rank_11_optim_states.pt
[2021-11-04 06:54:25,300] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_1_mp_rank_08_optim_states.pt
[2021-11-04 06:54:25,301] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_0_mp_rank_05_optim_states.pt
[2021-11-04 06:54:25,301] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_0_mp_rank_09_optim_states.pt
[2021-11-04 06:54:25,302] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_2_mp_rank_04_optim_states.pt
[2021-11-04 06:54:25,305] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_2_mp_rank_05_optim_states.pt
[2021-11-04 06:54:25,389] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_0_mp_rank_03_optim_states.pt
[2021-11-04 06:54:25,391] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_0_mp_rank_15_optim_states.pt
[2021-11-04 06:54:25,393] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_0_mp_rank_14_optim_states.pt
[2021-11-04 06:54:25,416] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_3_mp_rank_02_optim_states.pt
[2021-11-04 06:54:25,416] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_3_mp_rank_00_optim_states.pt
[2021-11-04 06:54:25,418] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_2_mp_rank_13_optim_states.pt
[2021-11-04 06:54:25,418] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_2_mp_rank_12_optim_states.pt
[2021-11-04 06:54:25,420] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_1_mp_rank_12_optim_states.pt
[2021-11-04 06:54:25,420] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_1_mp_rank_14_optim_states.pt
[2021-11-04 06:54:25,425] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_3_mp_rank_13_optim_states.pt
[2021-11-04 06:54:25,425] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_2_mp_rank_01_optim_states.pt
[2021-11-04 06:54:25,425] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_2_mp_rank_03_optim_states.pt
[2021-11-04 06:54:25,426] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_0_mp_rank_01_optim_states.pt
[2021-11-04 06:54:25,426] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_0_mp_rank_12_optim_states.pt
[2021-11-04 06:54:25,426] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_1_mp_rank_02_optim_states.pt
[2021-11-04 06:54:25,426] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_1_mp_rank_00_optim_states.pt
[2021-11-04 06:54:25,427] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_0_mp_rank_13_optim_states.pt
[2021-11-04 06:54:25,431] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_3_mp_rank_12_optim_states.pt
[2021-11-04 06:54:25,431] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_0_mp_rank_00_optim_states.pt
[2021-11-04 06:54:25,439] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_0_mp_rank_02_optim_states.pt
[2021-11-04 06:54:25,448] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_2_mp_rank_15_optim_states.pt
[2021-11-04 06:54:25,450] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_3_mp_rank_03_optim_states.pt
[2021-11-04 06:54:25,451] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_2_mp_rank_02_optim_states.pt
[2021-11-04 06:54:25,451] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_2_mp_rank_14_optim_states.pt
[2021-11-04 06:54:25,452] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_1_mp_rank_15_optim_states.pt
[2021-11-04 06:54:25,452] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_1_mp_rank_01_optim_states.pt
[2021-11-04 06:54:25,454] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_1_mp_rank_13_optim_states.pt
[2021-11-04 06:54:25,460] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_3_mp_rank_15_optim_states.pt
[2021-11-04 06:54:25,463] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_2_mp_rank_00_optim_states.pt
[2021-11-04 06:54:25,466] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_1_mp_rank_03_optim_states.pt
[2021-11-04 06:54:25,468] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_3_mp_rank_14_optim_states.pt
[2021-11-04 06:54:25,471] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_3_mp_rank_01_optim_states.pt
  successfully saved checkpoint at iteration   31500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
time (ms) | save-checkpoint: 1166.57
 iteration    31600/  152972 | consumed samples:     11099584 | consumed tokens:  22731948032 | elapsed time per iteration (ms): 6150.5 | learning rate: 1.898E-04 | global batch size:   512 | lm loss: 2.087465E+00 | loss scale: 2097152.0 | grad norm: 144070.937 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    31800/  152972 | consumed samples:     11201984 | consumed tokens:  22941663232 | elapsed time per iteration (ms): 6192.3 | learning rate: 1.896E-04 | global batch size:   512 | lm loss: 2.053407E+00 | loss scale: 2097152.0 | grad norm: 162708.512 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
[2021-11-04 07:45:40,958] [INFO] [logging.py:68:log_dist] [Rank 0] step=32000, skipped=63, lr=[0.00018938783712130853, 0.00018938783712130853], mom=[(0.9, 0.999), (0.9, 0.999)]
 iteration    32000/  152972 | consumed samples:     11304384 | consumed tokens:  23151378432 | elapsed time per iteration (ms): 6113.0 | learning rate: 1.894E-04 | global batch size:   512 | lm loss: 2.080148E+00 | loss scale: 524288.0 | grad norm: 42846.790 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
steps: 32000 loss: 2.0391 iter time (s): 0.003 samples/sec: 168813.429
-------------------------------------------------------------------------------------------------
 validation loss at iteration 32000 | lm loss value: 2.024950E+00 | lm loss PPL: 7.575734E+00 | 
-------------------------------------------------------------------------------------------------
 iteration    32200/  152972 | consumed samples:     11406784 | consumed tokens:  23361093632 | elapsed time per iteration (ms): 7101.5 | learning rate: 1.892E-04 | global batch size:   512 | lm loss: 2.079365E+00 | loss scale: 524288.0 | grad norm: 44540.167 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    32400/  152972 | consumed samples:     11509184 | consumed tokens:  23570808832 | elapsed time per iteration (ms): 6063.2 | learning rate: 1.890E-04 | global batch size:   512 | lm loss: 2.076157E+00 | loss scale: 524288.0 | grad norm: 45004.659 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    32600/  152972 | consumed samples:     11611584 | consumed tokens:  23780524032 | elapsed time per iteration (ms): 6065.5 | learning rate: 1.888E-04 | global batch size:   512 | lm loss: 2.062989E+00 | loss scale: 1048576.0 | grad norm: 86946.987 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    32800/  152972 | consumed samples:     11713984 | consumed tokens:  23990239232 | elapsed time per iteration (ms): 6070.9 | learning rate: 1.886E-04 | global batch size:   512 | lm loss: 2.065025E+00 | loss scale: 1048576.0 | grad norm: 79200.807 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    33000/  152972 | consumed samples:     11816384 | consumed tokens:  24199954432 | elapsed time per iteration (ms): 6072.8 | learning rate: 1.884E-04 | global batch size:   512 | lm loss: 2.098187E+00 | loss scale: 2097152.0 | grad norm: 178195.066 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
-------------------------------------------------------------------------------------------------
 validation loss at iteration 33000 | lm loss value: 2.043273E+00 | lm loss PPL: 7.715822E+00 | 
-------------------------------------------------------------------------------------------------
saving checkpoint at iteration   33000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
[2021-11-04 09:33:41,742] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/mp_rank_01_model_states.pt
[2021-11-04 09:33:41,994] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/mp_rank_00_model_states.pt
[2021-11-04 09:33:42,384] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_0_mp_rank_08_optim_states.pt
[2021-11-04 09:33:42,384] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_1_mp_rank_06_optim_states.pt
[2021-11-04 09:33:42,384] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_3_mp_rank_10_optim_states.pt
[2021-11-04 09:33:42,384] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_0_mp_rank_11_optim_states.pt
[2021-11-04 09:33:42,385] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_0_mp_rank_05_optim_states.pt
[2021-11-04 09:33:42,390] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_3_mp_rank_04_optim_states.pt
[2021-11-04 09:33:42,392] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_3_mp_rank_11_optim_states.pt
[2021-11-04 09:33:42,393] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_2_mp_rank_11_optim_states.pt
[2021-11-04 09:33:42,394] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_2_mp_rank_10_optim_states.pt
[2021-11-04 09:33:42,397] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_0_mp_rank_04_optim_states.pt
[2021-11-04 09:33:42,397] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_1_mp_rank_04_optim_states.pt
[2021-11-04 09:33:42,401] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_1_mp_rank_11_optim_states.pt
[2021-11-04 09:33:42,406] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_1_mp_rank_08_optim_states.pt
[2021-11-04 09:33:42,407] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_0_mp_rank_10_optim_states.pt
[2021-11-04 09:33:42,408] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_1_mp_rank_05_optim_states.pt
[2021-11-04 09:33:42,410] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_3_mp_rank_09_optim_states.pt
[2021-11-04 09:33:42,411] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_0_mp_rank_07_optim_states.pt
[2021-11-04 09:33:42,414] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_0_mp_rank_06_optim_states.pt
[2021-11-04 09:33:42,428] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_1_mp_rank_10_optim_states.pt
[2021-11-04 09:33:42,433] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_1_mp_rank_07_optim_states.pt
[2021-11-04 09:33:42,434] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_2_mp_rank_07_optim_states.pt
[2021-11-04 09:33:42,487] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_2_mp_rank_04_optim_states.pt
[2021-11-04 09:33:42,488] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_2_mp_rank_08_optim_states.pt
[2021-11-04 09:33:42,499] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_2_mp_rank_09_optim_states.pt
[2021-11-04 09:33:42,524] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_0_mp_rank_02_optim_states.pt
[2021-11-04 09:33:42,527] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_3_mp_rank_08_optim_states.pt
[2021-11-04 09:33:42,516] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_2_mp_rank_00_optim_states.pt
[2021-11-04 09:33:42,522] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_2_mp_rank_03_optim_states.pt
[2021-11-04 09:33:42,522] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_0_mp_rank_14_optim_states.pt
[2021-11-04 09:33:42,523] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_0_mp_rank_12_optim_states.pt
[2021-11-04 09:33:42,532] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_3_mp_rank_13_optim_states.pt
[2021-11-04 09:33:42,524] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_1_mp_rank_02_optim_states.pt
[2021-11-04 09:33:42,530] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_1_mp_rank_00_optim_states.pt
[2021-11-04 09:33:42,531] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_3_mp_rank_02_optim_states.pt
[2021-11-04 09:33:42,531] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_2_mp_rank_13_optim_states.pt
[2021-11-04 09:33:42,532] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_2_mp_rank_14_optim_states.pt
[2021-11-04 09:33:42,531] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_1_mp_rank_14_optim_states.pt
[2021-11-04 09:33:42,548] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_0_mp_rank_01_optim_states.pt
[2021-11-04 09:33:42,554] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_3_mp_rank_14_optim_states.pt
[2021-11-04 09:33:42,554] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_3_mp_rank_15_optim_states.pt
[2021-11-04 09:33:42,554] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_2_mp_rank_15_optim_states.pt
[2021-11-04 09:33:42,555] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_3_mp_rank_12_optim_states.pt
[2021-11-04 09:33:42,555] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_2_mp_rank_02_optim_states.pt
[2021-11-04 09:33:42,559] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_0_mp_rank_13_optim_states.pt
[2021-11-04 09:33:42,559] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_2_mp_rank_12_optim_states.pt
[2021-11-04 09:33:42,561] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_3_mp_rank_07_optim_states.pt
[2021-11-04 09:33:42,563] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_1_mp_rank_03_optim_states.pt
[2021-11-04 09:33:42,564] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_1_mp_rank_13_optim_states.pt
[2021-11-04 09:33:42,565] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_0_mp_rank_03_optim_states.pt
[2021-11-04 09:33:42,565] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_1_mp_rank_12_optim_states.pt
[2021-11-04 09:33:42,568] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_1_mp_rank_15_optim_states.pt
[2021-11-04 09:33:42,568] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_1_mp_rank_01_optim_states.pt
[2021-11-04 09:33:42,579] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_3_mp_rank_01_optim_states.pt
[2021-11-04 09:33:42,583] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_2_mp_rank_01_optim_states.pt
[2021-11-04 09:33:42,584] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_3_mp_rank_05_optim_states.pt
[2021-11-04 09:33:42,584] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_3_mp_rank_03_optim_states.pt
[2021-11-04 09:33:42,589] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_0_mp_rank_09_optim_states.pt
[2021-11-04 09:33:42,593] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_0_mp_rank_00_optim_states.pt
[2021-11-04 09:33:42,595] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_2_mp_rank_05_optim_states.pt
[2021-11-04 09:33:42,596] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_2_mp_rank_06_optim_states.pt
[2021-11-04 09:33:42,604] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_3_mp_rank_06_optim_states.pt
[2021-11-04 09:33:42,621] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_1_mp_rank_09_optim_states.pt
[2021-11-04 09:33:42,658] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_3_mp_rank_00_optim_states.pt
[2021-11-04 09:33:42,676] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_0_mp_rank_15_optim_states.pt
  successfully saved checkpoint at iteration   33000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
time (ms) | save-checkpoint: 1498.88
 iteration    33200/  152972 | consumed samples:     11918784 | consumed tokens:  24409669632 | elapsed time per iteration (ms): 7105.0 | learning rate: 1.882E-04 | global batch size:   512 | lm loss: 2.071210E+00 | loss scale: 1048576.0 | grad norm: 90033.149 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    33400/  152972 | consumed samples:     12021184 | consumed tokens:  24619384832 | elapsed time per iteration (ms): 6158.6 | learning rate: 1.880E-04 | global batch size:   512 | lm loss: 2.091822E+00 | loss scale: 1048576.0 | grad norm: 85197.409 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    33600/  152972 | consumed samples:     12123584 | consumed tokens:  24829100032 | elapsed time per iteration (ms): 6330.3 | learning rate: 1.878E-04 | global batch size:   512 | lm loss: 2.090412E+00 | loss scale: 1048576.0 | grad norm: 89363.393 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    33800/  152972 | consumed samples:     12225984 | consumed tokens:  25038815232 | elapsed time per iteration (ms): 6316.8 | learning rate: 1.876E-04 | global batch size:   512 | lm loss: 2.065592E+00 | loss scale: 2097152.0 | grad norm: 171576.527 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
[2021-11-04 11:17:01,704] [INFO] [logging.py:68:log_dist] [Rank 0] step=34000, skipped=66, lr=[0.00018738857969774513, 0.00018738857969774513], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 34000 loss: 1.8980 iter time (s): 0.003 samples/sec: 168783.649
 iteration    34000/  152972 | consumed samples:     12328384 | consumed tokens:  25248530432 | elapsed time per iteration (ms): 6119.2 | learning rate: 1.874E-04 | global batch size:   512 | lm loss: 2.063387E+00 | loss scale: 2097152.0 | grad norm: 156525.593 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
-------------------------------------------------------------------------------------------------
 validation loss at iteration 34000 | lm loss value: 2.036580E+00 | lm loss PPL: 7.664353E+00 | 
-------------------------------------------------------------------------------------------------
 iteration    34200/  152972 | consumed samples:     12430784 | consumed tokens:  25458245632 | elapsed time per iteration (ms): 7156.1 | learning rate: 1.872E-04 | global batch size:   512 | lm loss: 2.061139E+00 | loss scale: 1048576.0 | grad norm: 86702.237 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    34400/  152972 | consumed samples:     12533184 | consumed tokens:  25667960832 | elapsed time per iteration (ms): 6079.2 | learning rate: 1.870E-04 | global batch size:   512 | lm loss: 2.071460E+00 | loss scale: 1048576.0 | grad norm: 88209.231 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
saving checkpoint at iteration   34500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
[2021-11-04 12:11:17,372] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/mp_rank_00_model_states.pt
[2021-11-04 12:11:17,391] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/mp_rank_01_model_states.pt
[2021-11-04 12:11:17,775] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_3_mp_rank_07_optim_states.pt
[2021-11-04 12:11:17,777] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_0_mp_rank_04_optim_states.pt
[2021-11-04 12:11:17,780] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_0_mp_rank_10_optim_states.pt
[2021-11-04 12:11:17,783] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_1_mp_rank_07_optim_states.pt
[2021-11-04 12:11:17,783] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_2_mp_rank_06_optim_states.pt
[2021-11-04 12:11:17,784] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_0_mp_rank_05_optim_states.pt
[2021-11-04 12:11:17,786] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_1_mp_rank_11_optim_states.pt
[2021-11-04 12:11:17,786] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_0_mp_rank_06_optim_states.pt
[2021-11-04 12:11:17,789] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_1_mp_rank_08_optim_states.pt
[2021-11-04 12:11:17,789] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_3_mp_rank_09_optim_states.pt
[2021-11-04 12:11:17,789] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_2_mp_rank_09_optim_states.pt
[2021-11-04 12:11:17,791] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_2_mp_rank_05_optim_states.pt
[2021-11-04 12:11:17,791] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_2_mp_rank_08_optim_states.pt
[2021-11-04 12:11:17,793] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_3_mp_rank_08_optim_states.pt
[2021-11-04 12:11:17,793] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_1_mp_rank_04_optim_states.pt
[2021-11-04 12:11:17,800] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_3_mp_rank_04_optim_states.pt
[2021-11-04 12:11:17,803] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_1_mp_rank_09_optim_states.pt
[2021-11-04 12:11:17,803] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_2_mp_rank_11_optim_states.pt
[2021-11-04 12:11:17,804] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_1_mp_rank_05_optim_states.pt
[2021-11-04 12:11:17,805] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_0_mp_rank_11_optim_states.pt
[2021-11-04 12:11:17,811] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_0_mp_rank_07_optim_states.pt
[2021-11-04 12:11:17,811] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_0_mp_rank_08_optim_states.pt
[2021-11-04 12:11:17,812] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_1_mp_rank_06_optim_states.pt
[2021-11-04 12:11:17,812] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_2_mp_rank_07_optim_states.pt
[2021-11-04 12:11:17,815] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_1_mp_rank_10_optim_states.pt
[2021-11-04 12:11:17,818] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_2_mp_rank_10_optim_states.pt
[2021-11-04 12:11:17,818] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_3_mp_rank_05_optim_states.pt
[2021-11-04 12:11:17,819] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_3_mp_rank_06_optim_states.pt
[2021-11-04 12:11:17,821] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_2_mp_rank_04_optim_states.pt
[2021-11-04 12:11:17,826] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_3_mp_rank_10_optim_states.pt
[2021-11-04 12:11:17,827] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_3_mp_rank_11_optim_states.pt
[2021-11-04 12:11:17,831] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_0_mp_rank_09_optim_states.pt
[2021-11-04 12:11:17,908] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_0_mp_rank_14_optim_states.pt
[2021-11-04 12:11:17,909] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_1_mp_rank_03_optim_states.pt
[2021-11-04 12:11:17,910] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_1_mp_rank_12_optim_states.pt
[2021-11-04 12:11:17,913] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_0_mp_rank_02_optim_states.pt
[2021-11-04 12:11:17,916] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_2_mp_rank_01_optim_states.pt
[2021-11-04 12:11:17,917] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_2_mp_rank_03_optim_states.pt
[2021-11-04 12:11:17,917] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_3_mp_rank_12_optim_states.pt
[2021-11-04 12:11:17,918] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_2_mp_rank_14_optim_states.pt
[2021-11-04 12:11:17,923] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_3_mp_rank_13_optim_states.pt
[2021-11-04 12:11:17,923] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_0_mp_rank_12_optim_states.pt
[2021-11-04 12:11:17,923] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_3_mp_rank_01_optim_states.pt
[2021-11-04 12:11:17,924] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_1_mp_rank_15_optim_states.pt
[2021-11-04 12:11:17,925] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_2_mp_rank_00_optim_states.pt
[2021-11-04 12:11:17,925] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_1_mp_rank_01_optim_states.pt
[2021-11-04 12:11:17,926] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_0_mp_rank_00_optim_states.pt
[2021-11-04 12:11:17,937] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_0_mp_rank_03_optim_states.pt
[2021-11-04 12:11:17,937] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_3_mp_rank_03_optim_states.pt
[2021-11-04 12:11:17,949] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_2_mp_rank_15_optim_states.pt
[2021-11-04 12:11:17,952] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_3_mp_rank_14_optim_states.pt
[2021-11-04 12:11:17,952] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_1_mp_rank_00_optim_states.pt
[2021-11-04 12:11:17,953] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_3_mp_rank_15_optim_states.pt
[2021-11-04 12:11:17,954] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_1_mp_rank_13_optim_states.pt
[2021-11-04 12:11:17,954] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_3_mp_rank_02_optim_states.pt
[2021-11-04 12:11:17,954] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_0_mp_rank_13_optim_states.pt
[2021-11-04 12:11:17,955] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_0_mp_rank_01_optim_states.pt
[2021-11-04 12:11:17,956] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_0_mp_rank_15_optim_states.pt
[2021-11-04 12:11:17,957] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_1_mp_rank_02_optim_states.pt
[2021-11-04 12:11:17,958] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_2_mp_rank_02_optim_states.pt
[2021-11-04 12:11:17,960] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_2_mp_rank_13_optim_states.pt
[2021-11-04 12:11:17,961] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_3_mp_rank_00_optim_states.pt
[2021-11-04 12:11:17,961] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_1_mp_rank_14_optim_states.pt
[2021-11-04 12:11:17,966] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_2_mp_rank_12_optim_states.pt
  successfully saved checkpoint at iteration   34500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
time (ms) | save-checkpoint: 1120.16
 iteration    34600/  152972 | consumed samples:     12635584 | consumed tokens:  25877676032 | elapsed time per iteration (ms): 6196.1 | learning rate: 1.868E-04 | global batch size:   512 | lm loss: 2.065935E+00 | loss scale: 2097152.0 | grad norm: 263118.984 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    34800/  152972 | consumed samples:     12737984 | consumed tokens:  26087391232 | elapsed time per iteration (ms): 6079.0 | learning rate: 1.865E-04 | global batch size:   512 | lm loss: 2.050999E+00 | loss scale: 2097152.0 | grad norm: 153991.539 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    35000/  152972 | consumed samples:     12840384 | consumed tokens:  26297106432 | elapsed time per iteration (ms): 6069.3 | learning rate: 1.863E-04 | global batch size:   512 | lm loss: 2.064597E+00 | loss scale: 1048576.0 | grad norm: 80465.124 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
-------------------------------------------------------------------------------------------------
 validation loss at iteration 35000 | lm loss value: 2.041205E+00 | lm loss PPL: 7.699883E+00 | 
-------------------------------------------------------------------------------------------------
 iteration    35200/  152972 | consumed samples:     12942784 | consumed tokens:  26506821632 | elapsed time per iteration (ms): 7104.2 | learning rate: 1.861E-04 | global batch size:   512 | lm loss: 2.308412E+00 | loss scale: 32768.0 | grad norm: 15419.527 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    35400/  152972 | consumed samples:     13045184 | consumed tokens:  26716536832 | elapsed time per iteration (ms): 6068.8 | learning rate: 1.859E-04 | global batch size:   512 | lm loss: 2.139549E+00 | loss scale: 32768.0 | grad norm: 2699.080 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    35600/  152972 | consumed samples:     13147584 | consumed tokens:  26926252032 | elapsed time per iteration (ms): 6053.8 | learning rate: 1.857E-04 | global batch size:   512 | lm loss: 2.072784E+00 | loss scale: 32768.0 | grad norm: 2697.586 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    35800/  152972 | consumed samples:     13249984 | consumed tokens:  27135967232 | elapsed time per iteration (ms): 6055.5 | learning rate: 1.855E-04 | global batch size:   512 | lm loss: 2.069739E+00 | loss scale: 65536.0 | grad norm: 5183.529 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
[2021-11-04 14:46:46,818] [INFO] [logging.py:68:log_dist] [Rank 0] step=36000, skipped=74, lr=[0.00018523568489549322, 0.00018523568489549322], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 36000 loss: 1.9417 iter time (s): 0.003 samples/sec: 169401.892
 iteration    36000/  152972 | consumed samples:     13352384 | consumed tokens:  27345682432 | elapsed time per iteration (ms): 6063.3 | learning rate: 1.852E-04 | global batch size:   512 | lm loss: 2.069384E+00 | loss scale: 65536.0 | grad norm: 4855.686 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
-------------------------------------------------------------------------------------------------
 validation loss at iteration 36000 | lm loss value: 2.035071E+00 | lm loss PPL: 7.652793E+00 | 
-------------------------------------------------------------------------------------------------
saving checkpoint at iteration   36000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
[2021-11-04 14:50:13,431] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/mp_rank_00_model_states.pt
[2021-11-04 14:50:13,458] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/mp_rank_01_model_states.pt
[2021-11-04 14:50:13,813] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_0_mp_rank_04_optim_states.pt
[2021-11-04 14:50:13,816] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_2_mp_rank_08_optim_states.pt
[2021-11-04 14:50:13,816] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_2_mp_rank_07_optim_states.pt
[2021-11-04 14:50:13,817] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_2_mp_rank_04_optim_states.pt
[2021-11-04 14:50:13,817] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_0_mp_rank_10_optim_states.pt
[2021-11-04 14:50:13,819] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_1_mp_rank_07_optim_states.pt
[2021-11-04 14:50:13,822] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_2_mp_rank_10_optim_states.pt
[2021-11-04 14:50:13,822] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_0_mp_rank_07_optim_states.pt
[2021-11-04 14:50:13,824] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_3_mp_rank_04_optim_states.pt
[2021-11-04 14:50:13,827] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_0_mp_rank_08_optim_states.pt
[2021-11-04 14:50:13,828] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_3_mp_rank_05_optim_states.pt
[2021-11-04 14:50:13,828] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_1_mp_rank_08_optim_states.pt
[2021-11-04 14:50:13,830] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_1_mp_rank_09_optim_states.pt
[2021-11-04 14:50:13,832] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_1_mp_rank_05_optim_states.pt
[2021-11-04 14:50:13,841] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_2_mp_rank_11_optim_states.pt
[2021-11-04 14:50:13,841] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_3_mp_rank_07_optim_states.pt
[2021-11-04 14:50:13,842] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_3_mp_rank_08_optim_states.pt
[2021-11-04 14:50:13,847] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_0_mp_rank_06_optim_states.pt
[2021-11-04 14:50:13,848] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_1_mp_rank_04_optim_states.pt
[2021-11-04 14:50:13,849] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_3_mp_rank_06_optim_states.pt
[2021-11-04 14:50:13,851] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_2_mp_rank_06_optim_states.pt
[2021-11-04 14:50:13,851] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_1_mp_rank_06_optim_states.pt
[2021-11-04 14:50:13,853] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_2_mp_rank_05_optim_states.pt
[2021-11-04 14:50:13,853] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_0_mp_rank_09_optim_states.pt
[2021-11-04 14:50:13,855] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_0_mp_rank_11_optim_states.pt
[2021-11-04 14:50:13,855] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_1_mp_rank_11_optim_states.pt
[2021-11-04 14:50:13,856] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_1_mp_rank_10_optim_states.pt
[2021-11-04 14:50:13,857] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_3_mp_rank_11_optim_states.pt
[2021-11-04 14:50:13,859] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_2_mp_rank_09_optim_states.pt
[2021-11-04 14:50:13,862] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_3_mp_rank_10_optim_states.pt
[2021-11-04 14:50:13,862] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_0_mp_rank_05_optim_states.pt
[2021-11-04 14:50:13,863] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_3_mp_rank_09_optim_states.pt
[2021-11-04 14:50:13,944] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_2_mp_rank_13_optim_states.pt
[2021-11-04 14:50:13,946] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_0_mp_rank_03_optim_states.pt
[2021-11-04 14:50:13,949] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_2_mp_rank_12_optim_states.pt
[2021-11-04 14:50:13,950] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_3_mp_rank_15_optim_states.pt
[2021-11-04 14:50:13,950] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_0_mp_rank_14_optim_states.pt
[2021-11-04 14:50:13,952] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_3_mp_rank_00_optim_states.pt
[2021-11-04 14:50:13,952] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_2_mp_rank_03_optim_states.pt
[2021-11-04 14:50:13,956] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_2_mp_rank_02_optim_states.pt
[2021-11-04 14:50:13,957] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_1_mp_rank_14_optim_states.pt
[2021-11-04 14:50:13,957] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_1_mp_rank_02_optim_states.pt
[2021-11-04 14:50:13,958] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_3_mp_rank_02_optim_states.pt
[2021-11-04 14:50:13,958] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_3_mp_rank_12_optim_states.pt
[2021-11-04 14:50:13,960] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_0_mp_rank_01_optim_states.pt
[2021-11-04 14:50:13,961] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_0_mp_rank_13_optim_states.pt
[2021-11-04 14:50:13,964] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_1_mp_rank_15_optim_states.pt
[2021-11-04 14:50:13,964] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_1_mp_rank_00_optim_states.pt
[2021-11-04 14:50:13,986] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_2_mp_rank_15_optim_states.pt
[2021-11-04 14:50:13,987] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_2_mp_rank_14_optim_states.pt
[2021-11-04 14:50:13,987] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_3_mp_rank_14_optim_states.pt
[2021-11-04 14:50:13,987] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_3_mp_rank_13_optim_states.pt
[2021-11-04 14:50:13,989] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_0_mp_rank_12_optim_states.pt
[2021-11-04 14:50:13,989] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_1_mp_rank_01_optim_states.pt
[2021-11-04 14:50:13,992] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_0_mp_rank_15_optim_states.pt
[2021-11-04 14:50:13,992] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_3_mp_rank_03_optim_states.pt
[2021-11-04 14:50:13,993] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_0_mp_rank_02_optim_states.pt
[2021-11-04 14:50:13,994] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_1_mp_rank_03_optim_states.pt
[2021-11-04 14:50:13,995] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_2_mp_rank_00_optim_states.pt
[2021-11-04 14:50:13,995] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_3_mp_rank_01_optim_states.pt
[2021-11-04 14:50:14,001] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_1_mp_rank_12_optim_states.pt
[2021-11-04 14:50:14,001] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_2_mp_rank_01_optim_states.pt
[2021-11-04 14:50:14,001] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_1_mp_rank_13_optim_states.pt
[2021-11-04 14:50:14,024] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_0_mp_rank_00_optim_states.pt
  successfully saved checkpoint at iteration   36000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
time (ms) | save-checkpoint: 1139.27
 iteration    36200/  152972 | consumed samples:     13454784 | consumed tokens:  27555397632 | elapsed time per iteration (ms): 7166.9 | learning rate: 1.850E-04 | global batch size:   512 | lm loss: 2.067782E+00 | loss scale: 131072.0 | grad norm: 10772.264 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    36400/  152972 | consumed samples:     13557184 | consumed tokens:  27765112832 | elapsed time per iteration (ms): 6264.7 | learning rate: 1.848E-04 | global batch size:   512 | lm loss: 2.064779E+00 | loss scale: 131072.0 | grad norm: 9982.568 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    36600/  152972 | consumed samples:     13659584 | consumed tokens:  27974828032 | elapsed time per iteration (ms): 6172.6 | learning rate: 1.846E-04 | global batch size:   512 | lm loss: 2.058450E+00 | loss scale: 131072.0 | grad norm: 10480.622 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    36800/  152972 | consumed samples:     13761984 | consumed tokens:  28184543232 | elapsed time per iteration (ms): 6160.5 | learning rate: 1.843E-04 | global batch size:   512 | lm loss: 2.054813E+00 | loss scale: 262144.0 | grad norm: 21606.940 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    37000/  152972 | consumed samples:     13864384 | consumed tokens:  28394258432 | elapsed time per iteration (ms): 6155.4 | learning rate: 1.841E-04 | global batch size:   512 | lm loss: 2.065561E+00 | loss scale: 262144.0 | grad norm: 22388.672 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
-------------------------------------------------------------------------------------------------
 validation loss at iteration 37000 | lm loss value: 2.038658E+00 | lm loss PPL: 7.680298E+00 | 
-------------------------------------------------------------------------------------------------
 iteration    37200/  152972 | consumed samples:     13966784 | consumed tokens:  28603973632 | elapsed time per iteration (ms): 7324.4 | learning rate: 1.839E-04 | global batch size:   512 | lm loss: 2.053246E+00 | loss scale: 524288.0 | grad norm: 38638.738 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    37400/  152972 | consumed samples:     14069184 | consumed tokens:  28813688832 | elapsed time per iteration (ms): 6088.0 | learning rate: 1.836E-04 | global batch size:   512 | lm loss: 2.044884E+00 | loss scale: 524288.0 | grad norm: 42099.513 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
saving checkpoint at iteration   37500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
[2021-11-04 17:28:04,918] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/mp_rank_01_model_states.pt
[2021-11-04 17:28:04,926] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/mp_rank_00_model_states.pt
[2021-11-04 17:28:05,293] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_2_mp_rank_04_optim_states.pt
[2021-11-04 17:28:05,294] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_0_mp_rank_09_optim_states.pt
[2021-11-04 17:28:05,295] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_0_mp_rank_05_optim_states.pt
[2021-11-04 17:28:05,295] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_3_mp_rank_07_optim_states.pt
[2021-11-04 17:28:05,296] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_1_mp_rank_07_optim_states.pt
[2021-11-04 17:28:05,297] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_3_mp_rank_11_optim_states.pt
[2021-11-04 17:28:05,301] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_2_mp_rank_10_optim_states.pt
[2021-11-04 17:28:05,302] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_2_mp_rank_07_optim_states.pt
[2021-11-04 17:28:05,303] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_1_mp_rank_08_optim_states.pt
[2021-11-04 17:28:05,307] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_3_mp_rank_04_optim_states.pt
[2021-11-04 17:28:05,308] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_1_mp_rank_11_optim_states.pt
[2021-11-04 17:28:05,308] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_3_mp_rank_08_optim_states.pt
[2021-11-04 17:28:05,313] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_0_mp_rank_04_optim_states.pt
[2021-11-04 17:28:05,315] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_1_mp_rank_06_optim_states.pt
[2021-11-04 17:28:05,316] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_3_mp_rank_09_optim_states.pt
[2021-11-04 17:28:05,318] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_0_mp_rank_11_optim_states.pt
[2021-11-04 17:28:05,320] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_1_mp_rank_04_optim_states.pt
[2021-11-04 17:28:05,322] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_1_mp_rank_05_optim_states.pt
[2021-11-04 17:28:05,323] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_2_mp_rank_06_optim_states.pt
[2021-11-04 17:28:05,324] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_2_mp_rank_08_optim_states.pt
[2021-11-04 17:28:05,324] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_2_mp_rank_11_optim_states.pt
[2021-11-04 17:28:05,326] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_3_mp_rank_05_optim_states.pt
[2021-11-04 17:28:05,326] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_1_mp_rank_09_optim_states.pt
[2021-11-04 17:28:05,327] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_3_mp_rank_10_optim_states.pt
[2021-11-04 17:28:05,327] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_0_mp_rank_10_optim_states.pt
[2021-11-04 17:28:05,328] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_2_mp_rank_09_optim_states.pt
[2021-11-04 17:28:05,329] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_3_mp_rank_06_optim_states.pt
[2021-11-04 17:28:05,329] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_0_mp_rank_07_optim_states.pt
[2021-11-04 17:28:05,334] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_2_mp_rank_05_optim_states.pt
[2021-11-04 17:28:05,335] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_0_mp_rank_06_optim_states.pt
[2021-11-04 17:28:05,335] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_0_mp_rank_08_optim_states.pt
[2021-11-04 17:28:05,338] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_1_mp_rank_10_optim_states.pt
[2021-11-04 17:28:05,418] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_0_mp_rank_02_optim_states.pt
[2021-11-04 17:28:05,424] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_2_mp_rank_12_optim_states.pt
[2021-11-04 17:28:05,425] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_2_mp_rank_15_optim_states.pt
[2021-11-04 17:28:05,426] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_3_mp_rank_15_optim_states.pt
[2021-11-04 17:28:05,427] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_3_mp_rank_00_optim_states.pt
[2021-11-04 17:28:05,428] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_1_mp_rank_00_optim_states.pt
[2021-11-04 17:28:05,428] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_0_mp_rank_01_optim_states.pt
[2021-11-04 17:28:05,429] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_2_mp_rank_01_optim_states.pt
[2021-11-04 17:28:05,432] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_1_mp_rank_14_optim_states.pt
[2021-11-04 17:28:05,434] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_0_mp_rank_13_optim_states.pt
[2021-11-04 17:28:05,435] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_3_mp_rank_03_optim_states.pt
[2021-11-04 17:28:05,438] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_3_mp_rank_12_optim_states.pt
[2021-11-04 17:28:05,439] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_0_mp_rank_12_optim_states.pt
[2021-11-04 17:28:05,440] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_1_mp_rank_15_optim_states.pt
[2021-11-04 17:28:05,449] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_1_mp_rank_01_optim_states.pt
[2021-11-04 17:28:05,455] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_2_mp_rank_03_optim_states.pt
[2021-11-04 17:28:05,457] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_3_mp_rank_13_optim_states.pt
[2021-11-04 17:28:05,465] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_2_mp_rank_14_optim_states.pt
[2021-11-04 17:28:05,466] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_2_mp_rank_13_optim_states.pt
[2021-11-04 17:28:05,466] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_3_mp_rank_14_optim_states.pt
[2021-11-04 17:28:05,469] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_0_mp_rank_15_optim_states.pt
[2021-11-04 17:28:05,470] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_0_mp_rank_00_optim_states.pt
[2021-11-04 17:28:05,471] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_1_mp_rank_12_optim_states.pt
[2021-11-04 17:28:05,472] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_3_mp_rank_02_optim_states.pt
[2021-11-04 17:28:05,473] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_2_mp_rank_00_optim_states.pt
[2021-11-04 17:28:05,475] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_0_mp_rank_03_optim_states.pt
[2021-11-04 17:28:05,475] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_0_mp_rank_14_optim_states.pt
[2021-11-04 17:28:05,478] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_1_mp_rank_13_optim_states.pt
[2021-11-04 17:28:05,479] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_2_mp_rank_02_optim_states.pt
[2021-11-04 17:28:05,484] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_1_mp_rank_03_optim_states.pt
[2021-11-04 17:28:05,485] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_1_mp_rank_02_optim_states.pt
[2021-11-04 17:28:05,490] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_3_mp_rank_01_optim_states.pt
  successfully saved checkpoint at iteration   37500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
time (ms) | save-checkpoint: 1121.51
 iteration    37600/  152972 | consumed samples:     14171584 | consumed tokens:  29023404032 | elapsed time per iteration (ms): 6099.0 | learning rate: 1.834E-04 | global batch size:   512 | lm loss: 2.056508E+00 | loss scale: 524288.0 | grad norm: 42464.581 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    37800/  152972 | consumed samples:     14273984 | consumed tokens:  29233119232 | elapsed time per iteration (ms): 6074.2 | learning rate: 1.832E-04 | global batch size:   512 | lm loss: 2.055578E+00 | loss scale: 1048576.0 | grad norm: 81142.019 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
[2021-11-04 18:18:44,735] [INFO] [logging.py:68:log_dist] [Rank 0] step=38000, skipped=75, lr=[0.00018292011486489588, 0.00018292011486489588], mom=[(0.9, 0.999), (0.9, 0.999)]
 iteration    38000/  152972 | consumed samples:     14376384 | consumed tokens:  29442834432 | elapsed time per iteration (ms): 6083.7 | learning rate: 1.829E-04 | global batch size:   512 | lm loss: 2.050276E+00 | loss scale: 1048576.0 | grad norm: 83594.919 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
steps: 38000 loss: 2.0121 iter time (s): 0.003 samples/sec: 167999.346
-------------------------------------------------------------------------------------------------
 validation loss at iteration 38000 | lm loss value: 2.026610E+00 | lm loss PPL: 7.588318E+00 | 
-------------------------------------------------------------------------------------------------
 iteration    38200/  152972 | consumed samples:     14478784 | consumed tokens:  29652549632 | elapsed time per iteration (ms): 7248.0 | learning rate: 1.827E-04 | global batch size:   512 | lm loss: 2.045791E+00 | loss scale: 1048576.0 | grad norm: 88471.890 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    38400/  152972 | consumed samples:     14581184 | consumed tokens:  29862264832 | elapsed time per iteration (ms): 6081.8 | learning rate: 1.824E-04 | global batch size:   512 | lm loss: 2.060999E+00 | loss scale: 1048576.0 | grad norm: 83390.688 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    38600/  152972 | consumed samples:     14683584 | consumed tokens:  30071980032 | elapsed time per iteration (ms): 6089.2 | learning rate: 1.822E-04 | global batch size:   512 | lm loss: 2.034178E+00 | loss scale: 1048576.0 | grad norm: 76433.242 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    38800/  152972 | consumed samples:     14785984 | consumed tokens:  30281695232 | elapsed time per iteration (ms): 6089.2 | learning rate: 1.820E-04 | global batch size:   512 | lm loss: 2.041228E+00 | loss scale: 1048576.0 | grad norm: 81479.524 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    39000/  152972 | consumed samples:     14888384 | consumed tokens:  30491410432 | elapsed time per iteration (ms): 6387.9 | learning rate: 1.817E-04 | global batch size:   512 | lm loss: 2.068646E+00 | loss scale: 2097152.0 | grad norm: 195257.564 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
-------------------------------------------------------------------------------------------------
 validation loss at iteration 39000 | lm loss value: 2.030519E+00 | lm loss PPL: 7.618039E+00 | 
-------------------------------------------------------------------------------------------------
saving checkpoint at iteration   39000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
[2021-11-04 20:08:39,298] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/mp_rank_00_model_states.pt
[2021-11-04 20:08:39,301] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/mp_rank_01_model_states.pt
[2021-11-04 20:08:39,700] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_0_mp_rank_04_optim_states.pt
[2021-11-04 20:08:39,701] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_3_mp_rank_07_optim_states.pt
[2021-11-04 20:08:39,702] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_3_mp_rank_10_optim_states.pt
[2021-11-04 20:08:39,702] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_0_mp_rank_05_optim_states.pt
[2021-11-04 20:08:39,704] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_0_mp_rank_08_optim_states.pt
[2021-11-04 20:08:39,705] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_2_mp_rank_07_optim_states.pt
[2021-11-04 20:08:39,707] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_3_mp_rank_11_optim_states.pt
[2021-11-04 20:08:39,707] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_1_mp_rank_11_optim_states.pt
[2021-11-04 20:08:39,708] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_2_mp_rank_09_optim_states.pt
[2021-11-04 20:08:39,708] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_0_mp_rank_10_optim_states.pt
[2021-11-04 20:08:39,708] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_1_mp_rank_07_optim_states.pt
[2021-11-04 20:08:39,709] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_2_mp_rank_04_optim_states.pt
[2021-11-04 20:08:39,709] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_2_mp_rank_10_optim_states.pt
[2021-11-04 20:08:39,710] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_1_mp_rank_08_optim_states.pt
[2021-11-04 20:08:39,714] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_1_mp_rank_04_optim_states.pt
[2021-11-04 20:08:39,717] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_3_mp_rank_05_optim_states.pt
[2021-11-04 20:08:39,727] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_2_mp_rank_11_optim_states.pt
[2021-11-04 20:08:39,728] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_0_mp_rank_06_optim_states.pt
[2021-11-04 20:08:39,729] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_2_mp_rank_06_optim_states.pt
[2021-11-04 20:08:39,730] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_3_mp_rank_08_optim_states.pt
[2021-11-04 20:08:39,731] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_1_mp_rank_09_optim_states.pt
[2021-11-04 20:08:39,734] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_0_mp_rank_09_optim_states.pt
[2021-11-04 20:08:39,734] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_3_mp_rank_04_optim_states.pt
[2021-11-04 20:08:39,735] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_0_mp_rank_07_optim_states.pt
[2021-11-04 20:08:39,735] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_3_mp_rank_09_optim_states.pt
[2021-11-04 20:08:39,736] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_0_mp_rank_11_optim_states.pt
[2021-11-04 20:08:39,736] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_1_mp_rank_05_optim_states.pt
[2021-11-04 20:08:39,736] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_1_mp_rank_06_optim_states.pt
[2021-11-04 20:08:39,738] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_2_mp_rank_05_optim_states.pt
[2021-11-04 20:08:39,741] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_1_mp_rank_10_optim_states.pt
[2021-11-04 20:08:39,742] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_3_mp_rank_06_optim_states.pt
[2021-11-04 20:08:39,746] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_2_mp_rank_08_optim_states.pt
[2021-11-04 20:08:39,826] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_2_mp_rank_00_optim_states.pt
[2021-11-04 20:08:39,831] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_0_mp_rank_02_optim_states.pt
[2021-11-04 20:08:39,834] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_3_mp_rank_14_optim_states.pt
[2021-11-04 20:08:39,834] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_2_mp_rank_13_optim_states.pt
[2021-11-04 20:08:39,835] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_2_mp_rank_01_optim_states.pt
[2021-11-04 20:08:39,835] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_0_mp_rank_14_optim_states.pt
[2021-11-04 20:08:39,839] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_3_mp_rank_02_optim_states.pt
[2021-11-04 20:08:39,840] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_1_mp_rank_13_optim_states.pt
[2021-11-04 20:08:39,841] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_3_mp_rank_15_optim_states.pt
[2021-11-04 20:08:39,847] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_1_mp_rank_14_optim_states.pt
[2021-11-04 20:08:39,847] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_2_mp_rank_15_optim_states.pt
[2021-11-04 20:08:39,847] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_0_mp_rank_00_optim_states.pt
[2021-11-04 20:08:39,848] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_1_mp_rank_00_optim_states.pt
[2021-11-04 20:08:39,849] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_0_mp_rank_13_optim_states.pt
[2021-11-04 20:08:39,853] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_1_mp_rank_01_optim_states.pt
[2021-11-04 20:08:39,858] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_3_mp_rank_03_optim_states.pt
[2021-11-04 20:08:39,864] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_3_mp_rank_12_optim_states.pt
[2021-11-04 20:08:39,868] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_3_mp_rank_13_optim_states.pt
[2021-11-04 20:08:39,871] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_2_mp_rank_14_optim_states.pt
[2021-11-04 20:08:39,876] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_0_mp_rank_01_optim_states.pt
[2021-11-04 20:08:39,876] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_0_mp_rank_12_optim_states.pt
[2021-11-04 20:08:39,877] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_2_mp_rank_12_optim_states.pt
[2021-11-04 20:08:39,877] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_1_mp_rank_12_optim_states.pt
[2021-11-04 20:08:39,878] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_1_mp_rank_15_optim_states.pt
[2021-11-04 20:08:39,879] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_0_mp_rank_03_optim_states.pt
[2021-11-04 20:08:39,879] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_0_mp_rank_15_optim_states.pt
[2021-11-04 20:08:39,879] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_2_mp_rank_02_optim_states.pt
[2021-11-04 20:08:39,881] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_3_mp_rank_00_optim_states.pt
[2021-11-04 20:08:39,884] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_2_mp_rank_03_optim_states.pt
[2021-11-04 20:08:39,885] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_1_mp_rank_03_optim_states.pt
[2021-11-04 20:08:39,887] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_1_mp_rank_02_optim_states.pt
[2021-11-04 20:08:39,889] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_3_mp_rank_01_optim_states.pt
  successfully saved checkpoint at iteration   39000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints
time (ms) | save-checkpoint: 1132.57
 iteration    39200/  152972 | consumed samples:     14990784 | consumed tokens:  30701125632 | elapsed time per iteration (ms): 7188.0 | learning rate: 1.815E-04 | global batch size:   512 | lm loss: 2.043959E+00 | loss scale: 2097152.0 | grad norm: 163340.761 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    39400/  152972 | consumed samples:     15093184 | consumed tokens:  30910840832 | elapsed time per iteration (ms): 6078.2 | learning rate: 1.812E-04 | global batch size:   512 | lm loss: 2.050927E+00 | loss scale: 524288.0 | grad norm: 39812.455 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    39600/  152972 | consumed samples:     15195584 | consumed tokens:  31120556032 | elapsed time per iteration (ms): 6080.7 | learning rate: 1.810E-04 | global batch size:   512 | lm loss: 2.059096E+00 | loss scale: 524288.0 | grad norm: 36877.013 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
 iteration    39800/  152972 | consumed samples:     15297984 | consumed tokens:  31330271232 | elapsed time per iteration (ms): 6096.6 | learning rate: 1.807E-04 | global batch size:   512 | lm loss: 2.034562E+00 | loss scale: 524288.0 | grad norm: 40947.509 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
[2021-11-04 21:50:08,652] [INFO] [logging.py:68:log_dist] [Rank 0] step=40000, skipped=81, lr=[0.0001804599959837998, 0.0001804599959837998], mom=[(0.9, 0.999), (0.9, 0.999)]
 iteration    40000/  152972 | consumed samples:     15400384 | consumed tokens:  31539986432 | elapsed time per iteration (ms): 6080.1 | learning rate: 1.805E-04 | global batch size:   512 | lm loss: 2.046368E+00 | loss scale: 1048576.0 | grad norm: 79641.172 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
steps: 40000 loss: 1.9234 iter time (s): 0.003 samples/sec: 168896.024
-------------------------------------------------------------------------------------------------
 validation loss at iteration 40000 | lm loss value: 2.018202E+00 | lm loss PPL: 7.524780E+00 | 
-------------------------------------------------------------------------------------------------
 iteration    40200/  152972 | consumed samples:     15502784 | consumed tokens:  31749701632 | elapsed time per iteration (ms): 7223.0 | learning rate: 1.802E-04 | global batch size:   512 | lm loss: 2.039917E+00 | loss scale: 1048576.0 | grad norm: 82712.329 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
srun: Job step aborted: Waiting up to 62 seconds for job step to finish.
Killing subprocess 122540
Killing subprocess 1284095
Killing subprocess 908359
Killing subprocess 252675
Killing subprocess 1964291
Killing subprocess 502625
Killing subprocess 2721736
Killing subprocess 122541
Killing subprocess 803909
Killing subprocess 1284096
Killing subprocess 1964292
Killing subprocess 908360
Killing subprocess 2133881
Killing subprocess 532510
Killing subprocess 252676
Killing subprocess 2791463
Killing subprocess 1284097
Killing subprocess 122542
Killing subprocess 908361
Killing subprocess 502626
Killing subprocess 1964293
Killing subprocess 803910
Killing subprocess 2721737
Killing subprocess 1288896
Killing subprocess 252677
Killing subprocess 122543
Killing subprocess 1284099
Killing subprocess 502627
slurmstepd: error: *** STEP 1825190.0 ON r6i3n0 CANCELLED AT 2021-11-04T22:24:00 ***
Killing subprocess 908362
Killing subprocess 532511
Killing subprocess 2721738
Killing subprocess 2133882
Killing subprocess 1964294
Main process received SIGTERM, exiting
Killing subprocess 2791464
Killing subprocess 532512
Killing subprocess 2133883
Killing subprocess 502629
Killing subprocess 803911
Killing subprocess 803912
Killing subprocess 532513
Killing subprocess 2791465
Main process received SIGTERM, exiting
Killing subprocess 1288897
Killing subprocess 2133884
Killing subprocess 252678
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 2721739
Main process received SIGTERM, exiting
Killing subprocess 2791466
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 1288898
Killing subprocess 1288899
Main process received SIGTERM, exiting
Killing subprocess 980734
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 980735
Killing subprocess 980736
Main process received SIGTERM, exiting
Killing subprocess 980737
Main process received SIGTERM, exiting
Killing subprocess 429407
Killing subprocess 2991515
Killing subprocess 179727
Killing subprocess 429408
Killing subprocess 2991516
Killing subprocess 179728
Killing subprocess 429409
Killing subprocess 429411
Killing subprocess 2991517
Killing subprocess 2991518
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 179729
Killing subprocess 179730
Main process received SIGTERM, exiting