*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
2021-09-27 16:59:03.088018: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.088023: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.088218: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.088246: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.088182: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.088307: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.088551: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.088557: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.089492: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.089578: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.089679: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.089675: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.089718: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.091894: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.092065: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.092080: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.094736: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.094766: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.094795: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.094795: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.099940: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.100442: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.100564: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.100672: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.102840: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.102832: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.102851: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.102851: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.104210: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.104327: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.104400: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.104505: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.104781: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.105177: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.105184: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.105193: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.105877: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.105878: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.106018: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.106268: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.106617: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.106724: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.106921: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.106920: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.107933: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.107977: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.109270: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.109271: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.131050: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.131213: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.131263: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.131265: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.210438: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.210504: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.210653: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.210659: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.307702: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.307703: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.307706: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:03.307705: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:04.108481: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:04.108491: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:04.108486: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 16:59:04.108498: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

--------------------------------------------------
----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------
--------------------------------------------------


JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
JIT compiled ops requires ninja--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


--------------------------------------------------DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system

--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja


----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

      meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja


JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report

--------------------------------------------------DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------

----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report


JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------
      meet the required dependencies to JIT install the op.
JIT compiled ops requires ninja
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report


----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------
DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninja--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja


--------------------------------------------------JIT compiled ops requires ninja

------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

DeepSpeed C++/CUDA extension op report--------------------------------------------------

JIT compiled ops requires ninja--------------------------------------------------
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

DeepSpeed C++/CUDA extension op report
--------------------------------------------------DeepSpeed C++/CUDA extension op report
--------------------------------------------------

DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------JIT compiled ops requires ninja
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------

DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja

--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
JIT compiled ops requires ninjaJIT compiled ops requires ninja


--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninja
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja

JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------
DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja

JIT compiled ops requires ninjaJIT compiled ops requires ninja
DeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninja
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja--------------------------------------------------

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m

[92m[OKAY][0m[92m[OKAY][0m--------------------------------------------------
--------------------------------------------------


----------------------------------------------------------------------------------------------------op name
op name
ninjaninjaninjaninja    ...................................................... ..................[92m[OKAY][0m   [92m[OKAY][0m
[92m[OKAY][0m[92m[OKAY][0m
--------------------------------------------------
  op nameop name................ ................ ................ installed  ................ installedinstalled ..installed    ..compatible....  
 compatiblecompatible


------------------------------------------------------------------------------------------------------------------------------------------------------
op name

ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m
compatible--------------------------------------------------
--------------------------------------------------
--------------------------------------------------


--------------------------------------------------
 op nameop nameop name................    ................installed................................   installed..  installed.. installed  .. compatible compatible
compatible..

-------------------------------------------------- ----------------------------------------------------------------------------------------------------compatible


[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
op name
cpu_adamcpu_adam cpu_adam ...............cpu_adam  ............... .............................. [92m[YES][0m   [92m[YES][0m[92m[YES][0m[92m[YES][0m......    ..................[92m[OKAY][0m  
 [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
op name op name ................op name................    installedinstalled................................    ....installedinstalled    ..compatible..compatible 
 compatible
--------------------------------------------------compatible


------------------------------------------------------------------------------------------------------------------------------------------------------


cpu_adam ............... [92m[YES][0mcpu_adam cpu_adamcpu_adam......    ............................................. [92m[OKAY][0m  [92m[YES][0m
[92m[YES][0m[92m[YES][0m   ..................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m

fused_adam .............fused_adamfused_adam   [93m[NO][0mfused_adam..........................    [93m[NO][0m.......[93m[NO][0m ............. [92m[OKAY][0m .......
.......   [93m[NO][0m[92m[OKAY][0m[92m[OKAY][0mfused_lamb
  
cpu_adamcpu_adamcpu_adam   ...............cpu_adam..............................   [92m[YES][0m ...............[92m[YES][0m [92m[YES][0m  [92m[YES][0m ............ ......  ......[92m[OKAY][0m[92m[OKAY][0m  

[92m[OKAY][0m[92m[OKAY][0m

ninjaninjaninjaninja  ..................   ....................................[92m[OKAY][0m..................  

fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0mfused_adam
.................... fused_lamb [93m[NO][0m [92m[OKAY][0m fused_lamb
 [92m[OKAY][0m[92m[OKAY][0m--------------------------------------------------[92m[OKAY][0m


fused_adam  .............fused_adam.............fused_lamb    [93m[NO][0m.......................... [93m[NO][0m   .......[93m[NO][0m[93m[NO][0m.......  [92m[OKAY][0m  .......
.......[92m[OKAY][0m 
....................  ............. [93m[NO][0m fused_lamb[92m[OKAY][0m  .......[93m[NO][0m
  .............[92m[OKAY][0m....... 
fused_adamfused_adam  fused_adam..........................fused_adam    [93m[NO][0m.............[93m[NO][0m.............    .......[93m[NO][0m.......[93m[NO][0m    [92m[OKAY][0m.......[92m[OKAY][0m.......

  [92m[OKAY][0m[92m[OKAY][0m
fused_lambfused_lamb
--------------------------------------------------op name--------------------------------------------------
-------------------------------------------------- 

 [92m[OKAY][0mfused_lamb[92m[OKAY][0m 
fused_lamb
 [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
  fused_lamb..........................  fused_lamb .............[93m[NO][0m  [93m[NO][0m [93m[NO][0m....................   ....... [92m[OKAY][0m .......
[92m[OKAY][0m[93m[NO][0m 
op name................op name  op name ................installed................    ................installed..installed   installed compatible....   ..compatiblecompatible 

............. fused_lamb.............   [93m[NO][0m[93m[NO][0m.............   ..............[93m[NO][0m  [92m[OKAY][0m [92m[OKAY][0msparse_attn
.......
  ............[92m[OKAY][0m [93m[NO][0m
 ....... [92m[OKAY][0m
sparse_attn ............ sparse_attn[93m[NO][0m  ...................  [93m[NO][0m[92m[OKAY][0msparse_attn
[92m[OKAY][0m 
....... [92m[OKAY][0m
ninjaninjaninjaninja    ...................................................... .................. [92m[OKAY][0m  [92m[OKAY][0m
[92m[OKAY][0m[92m[OKAY][0m
compatible

----------------------------------------------------------------------------------------------------
--------------------------------------------------
--------------------------------------------------

transformersparse_attn sparse_attn ............ ........................ sparse_attn  [93m[NO][0m[93m[NO][0m[93m[NO][0m    .................................   [92m[OKAY][0m[92m[OKAY][0m [93m[NO][0m

[92m[OKAY][0m 
  ................... transformer[92m[OKAY][0m sparse_attn 
sparse_attn sparse_attn............sparse_attn   sparse_attn............[93m[NO][0m............  ............  [93m[NO][0m [93m[NO][0m.......[93m[NO][0m    [92m[OKAY][0m..............

--------------------------------------------------
--------------------------------------------------
--------------------------------------------------

cpu_adamcpu_adam cpu_adam ............... ...............  [92m[YES][0m ...... cpu_adam...............[92m[YES][0m[92m[OKAY][0m   
.......transformer  transformer[92m[OKAY][0mstochastic_transformer............
[93m[NO][0m ........................   .......transformer[93m[NO][0m[93m[NO][0m   [92m[OKAY][0m....... ............
.......   [92m[OKAY][0mtransformer[92m[OKAY][0m [92m[OKAY][0m

............
op nameop name--------------------------------------------------op name 
......[92m[YES][0m ...............[92m[OKAY][0m  
......[92m[YES][0m  fused_adam......[92m[OKAY][0m  
   [93m[NO][0m.............  transformer .......[93m[NO][0m[93m[NO][0m   [92m[OKAY][0m .......
....... transformer  [92m[OKAY][0m [92m[OKAY][0m[93m[NO][0m............

 transformer[93m[NO][0m transformer ...................transformer   ............ [93m[NO][0m[92m[OKAY][0m ............
  ................................op name................    installedinstalled................ installed   ......installed    compatiblecompatible
compatible
..--------------------------------------------------
--------------------------------------------------
 
[92m[OKAY][0m.............fused_adam
............ ....... [92m[OKAY][0m[93m[NO][0m 
stochastic_transformer [92m[OKAY][0m .......
.  [92m[OKAY][0m[93m[NO][0m
  [93m[NO][0m.......transformer stochastic_transformer  ...................  [92m[OKAY][0m .[92m[OKAY][0m
[93m[NO][0m 
  [93m[NO][0m.......[93m[NO][0m stochastic_transformer  [92m[OKAY][0m....... .......
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------compatible

--------------------------------------------------
  .............[93m[NO][0m  [93m[NO][0mfused_adam.......   ....................[92m[OKAY][0m  
fused_adam[93m[NO][0m[92m[OKAY][0m 
stochastic_transformer  .......stochastic_transformer.   [92m[OKAY][0m.[93m[NO][0m
 [93m[NO][0mstochastic_transformer.......  stochastic_transformer [92m[OKAY][0m....... .
 . [92m[OKAY][0m [93m[NO][0m[93m[NO][0m
  .[92m[OKAY][0m [92m[OKAY][0m
stochastic_transformer[93m[NO][0m
  ........ stochastic_transformer stochastic_transformer[92m[OKAY][0m 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


cpu_adamcpu_adam cpu_adam ...............  ...............[92m[YES][0mcpu_adam ...............  ......[92m[YES][0m ...............  [92m[YES][0m...... [92m[OKAY][0m  [92m[YES][0m
[92m[OKAY][0m...... 
 .............fused_lamb.......   [93m[NO][0m.............fused_lamb[92m[OKAY][0m  
  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
stochastic_transformer   ...............   [92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m

 ....... [92m[OKAY][0m
 [93m[NO][0m. . ....... [93m[NO][0m [93m[NO][0m [92m[OKAY][0m .......
.......  [92m[OKAY][0m[92m[OKAY][0m

op nameop nameop name op name ................ ................  ................  installedinstalled................installed    ....installed ..  compatible compatible..

compatible ----------------------------------------------------------------------------------------------------

compatible
 ......[92m[OKAY][0m 
[92m[OKAY][0m
[93m[NO][0m.............  ....... [93m[NO][0mfused_lamb.......   .......[92m[OKAY][0m.............
   [92m[OKAY][0m
--------------------------------------------------

--------------------------------------------------
fused_adam fused_adam.............  .............[93m[NO][0mfused_adam fused_adam [93m[NO][0m  ....... .......................... .......   [93m[NO][0m[92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m
 
[93m[NO][0m [92m[OKAY][0m.......
 [92m[OKAY][0m
cpu_adam cpu_adam...............  cpu_adamcpu_adam...............[92m[YES][0m    ...............[92m[YES][0m.....................   [92m[OKAY][0m[92m[YES][0m ......
 ..............fused_lamb   [92m[OKAY][0mfused_lamb[92m[OKAY][0m............. 

sparse_attnfused_lamb  sparse_attn.........................   ............[93m[NO][0m[93m[NO][0m   [93m[NO][0msparse_attn.......   ..........................[92m[OKAY][0m   [92m[OKAY][0m
[93m[NO][0m
 [92m[OKAY][0mtransformer
  [92m[YES][0m......[92m[OKAY][0m  
[92m[OKAY][0m......
 [92m[OKAY][0mfused_adam
 .............[93m[NO][0m  [93m[NO][0m....... fused_lamb.......fused_lamb    [92m[OKAY][0m.............[92m[OKAY][0m.............
 
transformer .......  ........................[92m[OKAY][0m  
 ............. [93m[NO][0m ....... fused_adam [92m[OKAY][0m.............fused_adam
[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
[93m[NO][0m[93m[NO][0m transformer.......   ...................[92m[OKAY][0m  
[92m[OKAY][0msparse_attn[93m[NO][0m
 [93m[NO][0m fused_adam ............. .......fused_lamb  .............  [92m[OKAY][0m[93m[NO][0m.............
sparse_attn sparse_attn............  ............[93m[NO][0m  [93m[NO][0m....... sparse_attn ....... sparse_attn[92m[OKAY][0m  ............
............ [92m[OKAY][0m [93m[NO][0m[93m[NO][0m
  ............stochastic_transformer.......  stochastic_transformer . [92m[OKAY][0m 
[93m[NO][0m   [93m[NO][0m.......fused_lamb.......   ....... ............. [92m[OKAY][0m[92m[OKAY][0m
 [92m[OKAY][0m
[93m[NO][0m
fused_lamb .......  fused_lamb.............[92m[OKAY][0m  
transformer   .......transformer............  ....... [93m[NO][0m[92m[OKAY][0m ............
  [92m[OKAY][0m.......[93m[NO][0m
.[93m[NO][0m[93m[NO][0m   .......[93m[NO][0m.......stochastic_transformer   [92m[OKAY][0m.......
[93m[NO][0m.............  .......[93m[NO][0m  [92m[OKAY][0m.......
 sparse_attn[92m[OKAY][0m 
  transformer[92m[OKAY][0m 
.......transformer............   [92m[OKAY][0m............
.   [92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m
 ....... 
[92m[OKAY][0m
............ [93m[NO][0m sparse_attn.......  ............[92m[OKAY][0m 
stochastic_transformer[93m[NO][0m   [93m[NO][0mstochastic_transformer........    .......[93m[NO][0m.[92m[OKAY][0m   
.......[92m[OKAY][0m[93m[NO][0m 
[92m[OKAY][0m stochastic_transformer
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
[93m[NO][0m .......sparse_attn transformer sparse_attn............ [92m[OKAY][0m............ 
.......  .stochastic_transformer[92m[OKAY][0m  
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
  ............[93m[NO][0m[93m[NO][0m transformer  ..............[93m[NO][0m    ............[92m[OKAY][0m[92m[OKAY][0m.......
  
.[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
[93m[NO][0m[92m[OKAY][0m stochastic_transformer
.......transformer  transformer [92m[OKAY][0m. ............
  ............[93m[NO][0m[93m[NO][0m stochastic_transformer [93m[NO][0m .......   ........[92m[OKAY][0m....... 
  [93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m 

....... [92m[OKAY][0m
stochastic_transformerstochastic_transformer  ..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
------------------------------------------------------------------------------------------------------------------------------------------------------

JIT compiled ops requires ninja
DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report

------------------------------------------------------------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report


------------------------------------------------------------------------------------------------------------------------------------------------------


JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja


--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
----------------------------------------------------------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninjaJIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    ...................................................... ..................  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

op name

op name op name op name................................   ................ installed ................installed installed..    compatibleinstalled....
   --------------------------------------------------compatible
..compatible
 
--------------------------------------------------compatible--------------------------------------------------


cpu_adam-------------------------------------------------- 
............... cpu_adam[92m[YES][0m  cpu_adam.....................   ...............cpu_adam[92m[YES][0m [92m[OKAY][0m  [92m[YES][0m
...... .....................   [92m[OKAY][0m[92m[YES][0m[92m[OKAY][0m
 
ninjaninjaninjaninja   .................. .................................... ..................  [92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m
fused_adam......  .............[92m[OKAY][0m 
[93m[NO][0m fused_adam.......fused_adam   .............[92m[OKAY][0m 

--------------------------------------------------
--------------------------------------------------

----------------------------------------------------------------------------------------------------op name
op name
.............[93m[NO][0m fused_adam fused_lamb [93m[NO][0m....................    .................... [92m[OKAY][0m [93m[NO][0m[92m[OKAY][0m
[93m[NO][0m 
  op name................op name................    ................installed................installed    ....installed installed  compatible compatible..
..
 -------------------------------------------------- --------------------------------------------------
compatiblecompatible


----------------------------------------------------------------------------------------------------

cpu_adam cpu_adam...............  ...............[92m[YES][0m cpu_adam cpu_adam[92m[YES][0m ......  ............... .....................  [92m[OKAY][0m[92m[YES][0m
 ..............fused_lamb   fused_lamb[92m[OKAY][0m.............[92m[OKAY][0m [93m[NO][0m 

 ....................  [92m[OKAY][0m[93m[NO][0mfused_lamb
  ....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0msparse_attn 
 [92m[OKAY][0m [92m[YES][0m
......  ...... [92m[OKAY][0m[92m[OKAY][0m
fused_adam
............ [93m[NO][0m sparse_attn.......  ............[92m[OKAY][0m 
 .............fused_adam  [93m[NO][0m.............  .......[93m[NO][0m  fused_adam[92m[OKAY][0m.......
[93m[NO][0msparse_attn transformer ....... ............ sparse_attn............ [92m[OKAY][0m [93m[NO][0m
 [93m[NO][0m ............ transformer....... .......  [93m[NO][0m[92m[OKAY][0m ............ 
fused_adam   .............[92m[OKAY][0m.............fused_lamb  
 [93m[NO][0m[93m[NO][0m.............  fused_lamb ..............  .............[93m[NO][0m[92m[OKAY][0m   
[93m[NO][0m.......[92m[OKAY][0m  
.......[92m[OKAY][0mfused_lamb 
 [92m[OKAY][0mfused_lamb
 .......[93m[NO][0m[92m[OKAY][0mtransformer 
  .......[92m[OKAY][0m............ 
stochastic_transformer [92m[OKAY][0m [93m[NO][0m
.............  .............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0msparse_attn
transformer.  stochastic_transformer ................... [93m[NO][0m  . [92m[OKAY][0m[93m[NO][0m.......
sparse_attn  ........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0msparse_attn
sparse_attn
   .......[93m[NO][0mstochastic_transformer [92m[OKAY][0m  
[92m[OKAY][0m........
 [92m[OKAY][0m 
[93m[NO][0mstochastic_transformer  ........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
  transformer........................ transformer  [93m[NO][0m ............[93m[NO][0m ...................    [93m[NO][0m[93m[NO][0m.......[92m[OKAY][0m   ..............
[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0mtransformer

 transformer............  ............stochastic_transformer[93m[NO][0mstochastic_transformer    .[93m[NO][0m........    .......[93m[NO][0m[93m[NO][0m[92m[OKAY][0m   
..............[92m[OKAY][0m  
[92m[OKAY][0mstochastic_transformer[92m[OKAY][0m
 
stochastic_transformer.  .[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
ninjaninjaninjaninja   .................. ....................................  ..................[92m[OKAY][0m[92m[OKAY][0m  

[92m[OKAY][0m[92m[OKAY][0m--------------------------------------------------
--------------------------------------------------

--------------------------------------------------

op name--------------------------------------------------op name 
op name  ................op name ................  ................installed................ installed  .. installed ..installed  compatible compatible
....--------------------------------------------------
  
--------------------------------------------------compatible
compatible

--------------------------------------------------
--------------------------------------------------
cpu_adam ............... [92m[YES][0mcpu_adam cpu_adam .....................  [92m[OKAY][0mcpu_adam ...............
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


 [92m[YES][0m  .....................[92m[YES][0m   [92m[OKAY][0m[92m[YES][0m......
------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------
op name
 fused_adam ...... [92m[OKAY][0m.............  
[93m[NO][0m[92m[OKAY][0mfused_adam 
.......  .............[92m[OKAY][0m 
op name op nameop name ................  ................ ................................installed   installed installed installed....    ....compatiblecompatible 

 compatible----------------------------------------------------------------------------------------------------compatible


[93m[NO][0m ....... fused_adamfused_lamb[92m[OKAY][0m 
----------------------------------------------------------------------------------------------------

 fused_adam..........................  fused_lamb [93m[NO][0m  ....................[93m[NO][0m.............    [92m[OKAY][0m[93m[NO][0m[93m[NO][0m 
cpu_adamcpu_adam  ...............cpu_adamcpu_adam...............   [92m[YES][0m...............  ............... ......[92m[YES][0m[92m[YES][0m    [92m[YES][0m......[92m[OKAY][0m......  
...... [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

..............   .......[92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0m
fused_lambfused_lamb  ..........................sparse_attn   [93m[NO][0m[93m[NO][0m............   sparse_attn.......[93m[NO][0m.......    [92m[OKAY][0m...................[92m[OKAY][0m
  
[93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
fused_adam ............. fused_adam[93m[NO][0mfused_adam fused_adam.............    ....................[93m[NO][0m.............    [92m[OKAY][0m[93m[NO][0m[93m[NO][0m.......
transformer ............ transformer[93m[NO][0m  ...................sparse_attn sparse_attn  [93m[NO][0m[92m[OKAY][0m  ............
   ..............fused_lamb[92m[OKAY][0m  
...................   [93m[NO][0mstochastic_transformer[93m[NO][0m [92m[OKAY][0m .
 [92m[OKAY][0m.............fused_lamb[92m[OKAY][0m
  
  ..............[93m[NO][0m   [92m[OKAY][0mstochastic_transformer[92m[OKAY][0m.......

  [92m[OKAY][0m.transformer
[93m[NO][0m............. fused_lamb fused_lamb .......[93m[NO][0m.............  .............  [92m[OKAY][0m .......
transformer   [93m[NO][0m........................   .......[93m[NO][0m[93m[NO][0m   [92m[OKAY][0m..............
  [92m[OKAY][0m[92m[OKAY][0m

[93m[NO][0m[93m[NO][0m   .......[92m[OKAY][0m.......  
[92m[OKAY][0m[92m[OKAY][0m

stochastic_transformerstochastic_transformer  ..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

sparse_attn ............ [93m[NO][0m .......sparse_attn sparse_attn [92m[OKAY][0m sparse_attn........................
   ............[93m[NO][0m transformer[93m[NO][0m [93m[NO][0m  ....... ............ .............. [92m[OKAY][0m [93m[NO][0m 
[92m[OKAY][0m transformer[92m[OKAY][0m.......
 
 ............[92m[OKAY][0m transformertransformer
 [93m[NO][0m ............ stochastic_transformer...................    [93m[NO][0m[92m[OKAY][0m
.[93m[NO][0m   .......[93m[NO][0m.......stochastic_transformer    [92m[OKAY][0m........[92m[OKAY][0m
  
[92m[OKAY][0m[93m[NO][0m
 .......stochastic_transformer  stochastic_transformer[92m[OKAY][0m. 
 .[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m [92m[OKAY][0m

ninjaninjaninjaninja   ....................................   ....................................[92m[OKAY][0m[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m
--------------------------------------------------


------------------------------------------------------------------------------------------------------------------------------------------------------

op name
op name op nameop name ................ ................  ................ installed................  installed ..installed installed..   compatible compatible
....-------------------------------------------------- 
 
compatible--------------------------------------------------compatible


--------------------------------------------------
--------------------------------------------------
cpu_adam ............... [92m[YES][0mcpu_adamcpu_adam   ...............cpu_adam.....................    [92m[YES][0m...............[92m[OKAY][0m[92m[YES][0m  
 ......[92m[YES][0m......  [92m[OKAY][0m ......
[92m[OKAY][0m 
[92m[OKAY][0mfused_adam
 ............. [93m[NO][0m ....... [92m[OKAY][0mfused_adam
 fused_adam............. fused_lambfused_adam.............   .............  [93m[NO][0m[93m[NO][0m[93m[NO][0m.............    .....................[93m[NO][0m    [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m
.......
 fused_lambfused_lamb[92m[OKAY][0m 
 .......................... fused_lamb [93m[NO][0m [93m[NO][0m ............. .......sparse_attn  ............ ....... [92m[OKAY][0m[93m[NO][0m [93m[NO][0m
[92m[OKAY][0m  
..............  [92m[OKAY][0m[92m[OKAY][0m

transformer ............sparse_attn  [93m[NO][0m............  .......sparse_attn[93m[NO][0m  [92m[OKAY][0m ...................
sparse_attn   [92m[OKAY][0m[93m[NO][0mstochastic_transformer............
   ........[93m[NO][0m   [93m[NO][0m.......transformer[92m[OKAY][0m   ...................
[92m[OKAY][0m  
[92m[OKAY][0m[93m[NO][0m
transformer  .......transformer ............ ............ [92m[OKAY][0m [93m[NO][0m
[93m[NO][0m  ..............  [92m[OKAY][0mstochastic_transformer[92m[OKAY][0m
 
. stochastic_transformer[93m[NO][0m  stochastic_transformer........   .[93m[NO][0m[92m[OKAY][0m  
[93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------
op nameop name
op name  op name ................................  ................installed ................  installed  installed..installed  .. ..compatible..  
 compatiblecompatible--------------------------------------------------compatible


------------------------------------------------------------------------------------------------------------------------------------------------------


cpu_adam ............... cpu_adamcpu_adam[92m[YES][0m cpu_adam  ...............  .....................[92m[YES][0m...............    [92m[OKAY][0m[92m[YES][0m[92m[YES][0m
......   ............[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m

fused_adam ............. [93m[NO][0m ....... fused_adam[92m[OKAY][0m 
.............fused_adamfused_adam  fused_lamb [93m[NO][0m ..........................  ............. ....... [93m[NO][0m [93m[NO][0m[93m[NO][0m  [92m[OKAY][0m .....................
   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0mfused_lamb


 ............. fused_lamb[93m[NO][0m fused_lamb ............. ....... ............. [93m[NO][0m [92m[OKAY][0m 
[93m[NO][0msparse_attn.......   ...................[92m[OKAY][0m  
[93m[NO][0m[92m[OKAY][0m 
....... sparse_attn[92m[OKAY][0m 
............ [93m[NO][0mtransformer  ................... sparse_attn  [92m[OKAY][0m[93m[NO][0msparse_attn............
   ...................[93m[NO][0mtransformer    [93m[NO][0m[92m[OKAY][0m................... 
  .......[93m[NO][0m[92m[OKAY][0m  
[92m[OKAY][0m.......stochastic_transformer
  transformer.[92m[OKAY][0m transformer 
............ [93m[NO][0m ............ [93m[NO][0m stochastic_transformer....... [93m[NO][0m   ........[92m[OKAY][0m.......   
[93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m 

....... [92m[OKAY][0m
stochastic_transformer stochastic_transformer.  .[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------
op name
op name op name op name................................    ................................installedinstalled   .. installedinstalled  ....  compatible ..compatiblecompatible
 

--------------------------------------------------compatible----------------------------------------------------------------------------------------------------


--------------------------------------------------
cpu_adamcpu_adamcpu_adamcpu_adam   ..............................  ..............................[92m[YES][0m   [92m[YES][0m[92m[YES][0m......   ............   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[YES][0m

 
...... [92m[OKAY][0m
fused_adam fused_adam.............  .............[93m[NO][0mfused_adam   [93m[NO][0m....................   .......[92m[OKAY][0m 
[92m[OKAY][0m
[93m[NO][0mfused_lamb .......fused_lamb   .............[92m[OKAY][0m............. fused_adam .............
  [93m[NO][0m[93m[NO][0m  ..............fused_lamb[93m[NO][0m    [92m[OKAY][0m.............
[92m[OKAY][0m.......
 [92m[OKAY][0m [93m[NO][0m ....... [92m[OKAY][0m

fused_lamb .............sparse_attnsparse_attn   ........................  [93m[NO][0m[93m[NO][0m[93m[NO][0m   .....................sparse_attn   [92m[OKAY][0m[92m[OKAY][0m 

............ [93m[NO][0m[92m[OKAY][0m 
transformer.......transformer  ........................   [93m[NO][0m[93m[NO][0m[92m[OKAY][0m  
..............  transformer[92m[OKAY][0m[92m[OKAY][0m 
............
 sparse_attn stochastic_transformerstochastic_transformer............  [93m[NO][0m..   .......[93m[NO][0m [93m[NO][0m  ....... [93m[NO][0m [92m[OKAY][0m[92m[OKAY][0m .......

....... [92m[OKAY][0m
stochastic_transformer  .[92m[OKAY][0m 
[93m[NO][0m ....... transformer[92m[OKAY][0m 
............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninjaninjaninjaninja   ......................................................    [92m[OKAY][0m..................[92m[OKAY][0m
[92m[OKAY][0m 
--------------------------------------------------
[92m[OKAY][0m
--------------------------------------------------
op name---------------------------------------------------------------------------------------------------- 


................ op nameop nameop nameinstalled    ..................................  ................ compatibleinstalled installed
 installed ....--------------------------------------------------   
..compatiblecompatible 

compatible--------------------------------------------------
--------------------------------------------------

cpu_adam-------------------------------------------------- 
............... [92m[YES][0m ...... cpu_adam[92m[OKAY][0m cpu_adamcpu_adam
 ...............  ..............................[92m[YES][0m   [92m[YES][0m[92m[YES][0m......   fused_adam......[92m[OKAY][0m  ......
[92m[OKAY][0m.............  [92m[OKAY][0m
[93m[NO][0m
 ....... [92m[OKAY][0m
fused_adam fused_lamb.............fused_adam   fused_adam.............[93m[NO][0m .............  [93m[NO][0m .................... [93m[NO][0m  ....... [93m[NO][0m[92m[OKAY][0m ....... 
[92m[OKAY][0m .......
[92m[OKAY][0m fused_lamb[92m[OKAY][0m
 
............. [93m[NO][0m fused_lamb.......fused_lamb   sparse_attn.............[92m[OKAY][0m............. 
  ............[93m[NO][0m[93m[NO][0m   [93m[NO][0m..............   .......[92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0m
transformer sparse_attn............  ............[93m[NO][0msparse_attn sparse_attn  [93m[NO][0m ....... ........................ ....... [92m[OKAY][0m  [93m[NO][0m
[92m[OKAY][0m[93m[NO][0m 
 .............. transformerstochastic_transformer [92m[OKAY][0m [92m[OKAY][0m............ 
 
.[93m[NO][0m  transformertransformer [93m[NO][0m...................    [93m[NO][0m............[92m[OKAY][0m ....... [93m[NO][0m
.......   .......[92m[OKAY][0m[92m[OKAY][0m 
stochastic_transformer[92m[OKAY][0m
 
. [93m[NO][0mstochastic_transformer stochastic_transformer .......  ..[92m[OKAY][0m  
[93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report


DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report


--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja


JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

ninjaninjaninjaninja    .................................... .................................... [92m[OKAY][0m  [92m[OKAY][0m[92m[OKAY][0m

[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------op name
 
op name................op name op name ................  ................ installed ................installedinstalled   installed....  ..  compatible compatible..
compatible

---------------------------------------------------------------------------------------------------- 
--------------------------------------------------
compatible

--------------------------------------------------
cpu_adamcpu_adam  ..............................cpu_adam   [92m[YES][0mcpu_adam[92m[YES][0m............... ......  ...............   ......[92m[OKAY][0m[92m[YES][0m 
 [92m[YES][0m[92m[OKAY][0m ......
......  [92m[OKAY][0m[92m[OKAY][0m

fused_adam ............. fused_adam[93m[NO][0m  .................... fused_adam [93m[NO][0m [92m[OKAY][0m .............
....... fused_adam [92m[OKAY][0mfused_lamb [93m[NO][0m 
..........................  .......fused_lamb[93m[NO][0m    [92m[OKAY][0m[93m[NO][0m.......
 ............. [92m[OKAY][0m 
fused_lamb.......[93m[NO][0m  ....................  [93m[NO][0m  [92m[OKAY][0m[92m[OKAY][0m.......

 [92m[OKAY][0m
sparse_attn ............ [93m[NO][0mfused_lamb  ....................  [92m[OKAY][0msparse_attn[93m[NO][0msparse_attn 
............   ............transformer[93m[NO][0m.......    ............[93m[NO][0m.......  [93m[NO][0m  [92m[OKAY][0m[92m[OKAY][0m..............

  [92m[OKAY][0m[92m[OKAY][0mtransformer

 ............ transformer[93m[NO][0mstochastic_transformer   ....................   [93m[NO][0m[92m[OKAY][0msparse_attn[93m[NO][0m 
  .......stochastic_transformer...................    [92m[OKAY][0m[92m[OKAY][0m.[93m[NO][0m

  stochastic_transformer.......[93m[NO][0m   .[92m[OKAY][0m .......
[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m .......async_io  [92m[OKAY][0m...............
 [93m[NO][0m ....... [93m[NO][0mutils
 .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizertransformer_inference  .............. ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.


async_io ............... async_io[93m[NO][0m  ......................async_io   [93m[NO][0m[93m[NO][0m...............
  .......[93m[NO][0m  [93m[NO][0m.......
 [93m[NO][0m
transformer_inference .. [93m[NO][0mtransformer_inference  transformer_inference.........   ..[92m[OKAY][0m[93m[NO][0m 
 [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utils .................. utils[92m[YES][0m  utils........................   ..................[92m[OKAY][0m[92m[YES][0m 
 [92m[YES][0m......  ......quantizer[92m[OKAY][0m  
[92m[OKAY][0m..............
 [93m[NO][0m ....... quantizer[92m[OKAY][0mquantizer 
 ............................  [93m[NO][0m[93m[NO][0m -------------------------------------------------- .......
.......  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m async_io.......  [93m[NO][0m...............
 [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0mtransformer_inference  .........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
utils .................. [92m[YES][0mutils ......  ..................[92m[OKAY][0m 
[92m[YES][0m ...... quantizer[92m[OKAY][0m 
.............. [93m[NO][0m ....... quantizer[92m[OKAY][0m 
.............. [93m[NO][0m .......-------------------------------------------------- 
[92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m async_io.......  [93m[NO][0m...............
 [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m .......transformer_inference  [92m[OKAY][0m..
 [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m utils......  ..................[92m[OKAY][0m 
[92m[YES][0m ...... [92m[OKAY][0mquantizer
 .............. [93m[NO][0m quantizer.......  ..............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m .......async_io [93m[NO][0m 
............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m utils.......  ..................[92m[OKAY][0m 
[92m[YES][0m ...... [92m[OKAY][0m
utils ..................quantizer  [92m[YES][0m..............  ......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
quantizer --------------------------------------------------..............
 [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io ...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m [93m[NO][0m

transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ...............async_io  [93m[NO][0m...............  .......[93m[NO][0m  [93m[NO][0m
....... [93m[NO][0m
transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utils .................. utils[92m[YES][0m  ........................  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_ioasync_io ...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inference .. [93m[NO][0mtransformer_inference  .........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ......utils  [92m[OKAY][0m..................
 [92m[YES][0m ...... quantizer[92m[OKAY][0m 
.............. [93m[NO][0m ....... quantizer[92m[OKAY][0m 
.............. [93m[NO][0m .......-------------------------------------------------- 
[92m[OKAY][0m
--------------------------------------------------
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------


op nameop nameop nameop name    ................................................ ................ installed  installed ..installed installed  ..compatible .. ..
 compatiblecompatible --------------------------------------------------compatible


----------------------------------------------------------------------------------------------------
--------------------------------------------------

cpu_adam cpu_adam...............cpu_adamcpu_adam   ...............[92m[YES][0m...............    ...............[92m[YES][0m......[92m[YES][0m    [92m[OKAY][0m......[92m[YES][0m......
   [92m[OKAY][0m[92m[OKAY][0m
......
 [92m[OKAY][0m
fused_adam .............fused_adamfused_adam   .............[93m[NO][0mfused_adam.............   [93m[NO][0m .......[93m[NO][0m .............  .......[92m[OKAY][0m ....... [93m[NO][0m
 [92m[OKAY][0m [92m[OKAY][0m
.......
fused_lamb  fused_lamb.............[92m[OKAY][0m fused_lamb 
[93m[NO][0m.............   .......[93m[NO][0mfused_lamb.............    [92m[OKAY][0m.......[93m[NO][0m.............
   .......[92m[OKAY][0m[93m[NO][0m 
 [92m[OKAY][0m.......
 [92m[OKAY][0m
sparse_attn ............ sparse_attn[93m[NO][0msparse_attn  ...................   sparse_attn............[92m[OKAY][0m[93m[NO][0m   .......
............ [93m[NO][0m transformer[92m[OKAY][0m [93m[NO][0m 
............ .......  .......[93m[NO][0m[92m[OKAY][0mtransformer  [92m[OKAY][0m.......
 
............ transformer [92m[OKAY][0m[93m[NO][0m 
transformer ............ .......stochastic_transformer ............  [93m[NO][0m[92m[OKAY][0m .
 [93m[NO][0m ....... [93m[NO][0mstochastic_transformer  .......[92m[OKAY][0m .
  .......[93m[NO][0m[92m[OKAY][0m stochastic_transformer 
[92m[OKAY][0m .......
 .[92m[OKAY][0m 
stochastic_transformer[93m[NO][0m  ........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m .......async_io  [93m[NO][0m...............
 [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0mtransformer_inference
 .. [93m[NO][0m .......utils  [92m[OKAY][0m..................
 [92m[YES][0m ...... [92m[OKAY][0m
utils quantizer..................  ..............[92m[YES][0m  [93m[NO][0m......  .......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer --------------------------------------------------..............
 [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io ...............  ...............[93m[NO][0m  [93m[NO][0m.......  ....... [93m[NO][0m[93m[NO][0m

transformer_inference transformer_inference..  ..[93m[NO][0m  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ...............async_io [93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils ..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ...............async_io [93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m ....... async_io[93m[NO][0m
 ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... utils[92m[OKAY][0m 
.................. [92m[YES][0m utils......  ..................[92m[OKAY][0m 
[92m[YES][0m ...... [92m[OKAY][0mquantizer
 .............. [93m[NO][0m .......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m .......-------------------------------------------------- 
[92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ...... ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... transformer_inference[93m[NO][0m  .........  [93m[NO][0m[93m[NO][0m 
....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m--------------------------------------------------
 ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

utils .................. [92m[YES][0m ...... [92m[OKAY][0m
async_io ...............async_io [93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m 
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
....... [93m[NO][0m
transformer_inference .. [93m[NO][0mtransformer_inference  .........  [92m[OKAY][0m[93m[NO][0m
--------------------------------------------------
 ....... [92m[OKAY][0m
utils .................. utils[92m[YES][0m  ........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m ....... async_io[93m[NO][0m
 ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference ..utils  [93m[NO][0m..................  .......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer .............. [93m[NO][0m utils.......  ..................[92m[OKAY][0m 
[92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io ...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................ [93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
----------------------------------------------------------------------------------------------------

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... async_io[93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m
 ....... [93m[NO][0m
transformer_inference .. transformer_inference[93m[NO][0m  .........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
utilsutils  ....................................  [92m[YES][0m[92m[YES][0m ......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
----------------------------------------------------------------------------------------------------

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch version torch install path....................  1.8.1...............
 torch cuda version ............... 11.1
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']nvcc version
 ..................... torch version11.2 
....................deepspeed install path  1.8.1...........
 torch cuda version['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ...............
 deepspeed info11.1 
...................nvcc version  .....................0.4.2+72ce55a, 72ce55a, big-science 
11.2deepspeed wheel compiled w.
 deepspeed install path......  ...........torch 1.8, cuda 11.1 
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path torch install path...............  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version ....................torch version  1.8.1....................
 1.8.1
torch cuda version ...............torch cuda version  11.1...............
 nvcc version11.1 
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']torch version
 .................... torch version1.8.1 
.................... torch cuda version1.8.1 
.....................nvcc version  11.2.....................
 deepspeed install path11.2 
...........deepspeed install path  ...........['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
...................deepspeed info  0.4.2+72ce55a, 72ce55a, big-science...................
async_io ............... [93m[NO][0m ....... [93m[NO][0m
............... torch cuda version11.1 
............... nvcc version11.1 
..................... nvcc version11.2 
..................... deepspeed install path11.2 
........... deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']...........
 deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']................... 
0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed info
 deepspeed wheel compiled w.0.4.2+72ce55a, 72ce55a, big-science 
......deepspeed wheel compiled w.  torch 1.8, cuda 11.1......
 torch 1.8, cuda 11.1
 deepspeed wheel compiled w....................  ......0.4.2+72ce55a, 72ce55a, big-science 
torch 1.8, cuda 11.1deepspeed wheel compiled w.
 ...... torch 1.8, cuda 11.1
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path torch install path...............  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']torch version
 .................... torch version1.8.1 
.................... 1.8.1torch cuda version
 ...............torch cuda version  11.1...............
 nvcc version11.1 
..................... nvcc version11.2 
..................... deepspeed install path11.2 
...........deepspeed install path  ...........['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
................... deepspeed info0.4.2+72ce55a, 72ce55a, big-science ...................
 deepspeed wheel compiled w.0.4.2+72ce55a, 72ce55a, big-science 
...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 
...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version ....................  ....................1.8.1 
1.8.1
torch cuda version torch cuda version...............  ...............11.1 
11.1
nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch install pathtorch version  ...................................  1.8.1
torch cuda version ...............['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 
11.1
nvcc versiontorch version  .........................................  11.21.8.1

deepspeed install path torch cuda version...........  ............... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']11.1

deepspeed infonvcc version  ........................................  0.4.2+72ce55a, 72ce55a, big-science11.2

deepspeed install pathdeepspeed wheel compiled w.  .................  torch 1.8, cuda 11.1['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versionDeepSpeed general environment info:torch cuda version  .............................. 
 11.111.1

nvcc versionnvcc version  .....................torch install path..................... 11.2  
11.2...............deepspeed install path 
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']deepspeed info

 deepspeed info...................  torch version...................0.4.2+72ce55a, 72ce55a, big-science  
....................0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w. 
 1.8.1deepspeed wheel compiled w.......
  ......torch 1.8, cuda 11.1 
torch cuda versiontorch 1.8, cuda 11.1 
............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch version torch version....................  ....................1.8.1 
1.8.1
torch cuda version torch cuda version...............  ...............11.1 
11.1
nvcc version nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info
 ...................deepspeed info  0.4.2+72ce55a, 72ce55a, big-science...................
 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w.
 ......deepspeed wheel compiled w.  torch 1.8, cuda 11.1......
 torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  11.2.....................
 11.2deepspeed install path
 ...........deepspeed install path  ...........['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
...................deepspeed info  0.4.2+72ce55a, 72ce55a, big-science...................
 deepspeed wheel compiled w.0.4.2+72ce55a, 72ce55a, big-science 
......deepspeed wheel compiled w.  torch 1.8, cuda 11.1......
 torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ...... ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']DeepSpeed general environment info:
torch version
 .................... 1.8.1
torch install pathtorch cuda version  ..............................  11.1
nvcc version ..................... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']11.2

deepspeed install path torch version...........  .................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']1.8.1

deepspeed infotorch cuda version  ..................................  0.4.2+72ce55a, 72ce55a, big-science11.1

deepspeed wheel compiled w.nvcc version  ...........................  torch 1.8, cuda 11.111.2

deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info  ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']0.4.2+72ce55a, 72ce55a, big-science

deepspeed infodeepspeed wheel compiled w.  .........................  torch 1.8, cuda 11.10.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version ....................torch version  1.8.1....................
 1.8.1
torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ...................0.4.2+72ce55a, 72ce55a, big-science 
0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... DeepSpeed general environment info:1.8.1

torch cuda version ............... 11.1
torch install pathnvcc version  ....................................  11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

deepspeed info ...................torch version  0.4.2+72ce55a, 72ce55a, big-science....................
 deepspeed wheel compiled w.1.8.1 
...... torch 1.8, cuda 11.1torch cuda version
 ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
using world size: 64, data-parallel-size: 4, tensor-model-parallel size: 4, pipeline-model-parallel size: 4 
using torch.float16 for parameters ...
------------------------ arguments ------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
  accumulate_allreduce_grads_in_fp32 .............. False
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
  adam_beta1 ...................................... 0.9
  adam_beta2 ...................................... 0.999
  adam_eps ........................................ 1e-08
  adlr_autoresume ................................. False
  adlr_autoresume_interval ........................ 1000
  apply_query_key_layer_scaling ................... True
  apply_residual_connection_post_layernorm ........ False
  attention_dropout ............................... 0.1
  attention_softmax_in_fp32 ....................... False
  bert_binary_head ................................ True
  bert_load ....................................... None
  bf16 ............................................ False
  bias_dropout_fusion ............................. True
  bias_gelu_fusion ................................ True
  biencoder_projection_dim ........................ 0
  biencoder_shared_query_context_model ............ False
  block_data_path ................................. None
  checkpoint_activations .......................... True
  checkpoint_in_cpu ............................... False
  checkpoint_num_layers ........................... 1
  clip_grad ....................................... 1.0
  codecarbon_dir .................................. None
  consumed_train_samples .......................... 0
  consumed_valid_samples .......................... 0
  contigious_checkpointing ........................ False
  cpu_optimizer ................................... False
  cpu_torch_adam .................................. False
  data_impl ....................................... mmap
  data_parallel_size .............................. 4
  data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document']
  dataloader_type ................................. single
  DDP_impl ........................................ local
  decoder_seq_length .............................. None
  deepscale ....................................... False
  deepscale_config ................................ None
  deepspeed ....................................... True
  deepspeed_activation_checkpointing .............. True
  deepspeed_config ................................ ./ds_config.1269478.json
  deepspeed_mpi ................................... False
  distribute_checkpointed_activations ............. False
  distributed_backend ............................. nccl
  embedding_path .................................. None
  encoder_seq_length .............................. 2048
  eod_mask_loss ................................... False
  eval_interval ................................... 1000
  eval_iters ...................................... 100
  evidence_data_path .............................. None
  exit_duration_in_mins ........................... 1190
  exit_interval ................................... None
  ffn_hidden_size ................................. 8192
  finetune ........................................ False
  fp16 ............................................ True
  fp16_lm_cross_entropy ........................... False
  fp32_residual_connection ........................ False
  gigaflos_no_embeds .............................. 0
  global_batch_size ............................... 512
  glu_activation .................................. None
  hidden_dropout .................................. 0.1
  hidden_size ..................................... 2048
  hysteresis ...................................... 2
  ict_head_size ................................... None
  ict_load ........................................ None
  img_dim ......................................... 224
  indexer_batch_size .............................. 128
  indexer_log_interval ............................ 1000
  init_method_std ................................. 0.02
  init_method_xavier_uniform ...................... False
  initial_loss_scale .............................. 4294967296
  kv_channels ..................................... 128
  layernorm_epsilon ............................... 1e-05
  lazy_mpu_init ................................... None
  load ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
  local_rank ...................................... 0
  log_batch_size_to_tensorboard ................... True
  log_interval .................................... 200
  log_learning_rate_to_tensorboard ................ True
  log_loss_scale_to_tensorboard ................... True
  log_num_zeros_in_grad ........................... False
  log_params_norm ................................. False
  log_timers_to_tensorboard ....................... True
  log_validation_ppl_to_tensorboard ............... True
  loss_on_targets_only ............................ False
  loss_scale ...................................... 12.0
  loss_scale_window ............................... 1000
  lr .............................................. 0.0002
  lr_decay_iters .................................. None
  lr_decay_samples ................................ 73242187
  lr_decay_style .................................. cosine
  lr_warmup_fraction .............................. None
  lr_warmup_iters ................................. 0
  lr_warmup_samples ............................... 183105
  make_vocab_size_divisible_by .................... 128
  mask_prob ....................................... 0.15
  masked_softmax_fusion ........................... True
  max_position_embeddings ......................... None
  memory_centric_tiled_linear ..................... False
  merge_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-merges.txt
  micro_batch_size ................................ 8
/bin/sh: line 0: type: git: not found
  min_loss_scale .................................. 1.0
  min_lr .......................................... 1e-05
  mmap_warmup ..................................... False
  no_load_optim ................................... None
  no_load_rng ..................................... None
  no_save_optim ................................... None
  no_save_rng ..................................... None
  num_attention_heads ............................. 16
  num_channels .................................... 3
  num_classes ..................................... 1000
  num_layers ...................................... 24
  num_layers_per_virtual_pipeline_stage ........... None
  num_workers ..................................... 2
  onnx_safe ....................................... None
  openai_gelu ..................................... False
  optimizer ....................................... adam
  override_lr_scheduler ........................... False
  params_dtype .................................... torch.float16
  partition_activations ........................... False
  patch_dim ....................................... 16
  pipeline_model_parallel_size .................... 4
  position_embedding_type ......................... PositionEmbeddingType.rotary
  profile_backward ................................ False
  query_in_block_prob ............................. 0.1
  rampup_batch_size ............................... ['32', '32', '2_000_000']
  rank ............................................ 0
  remote_device ................................... none
  reset_attention_mask ............................ False
  reset_position_ids .............................. False
  retriever_report_topk_accuracies ................ []
  retriever_score_scaling ......................... False
  retriever_seq_length ............................ 256
  sample_rate ..................................... 1.0
  save ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
  save_interval ................................... 1500
  scatter_gather_tensors_in_pipeline .............. True
  scattered_embeddings ............................ False
  seed ............................................ 1234
  seq_length ...................................... 2048
  sgd_momentum .................................... 0.9
  short_seq_prob .................................. 0.1
  split ........................................... 949,50,1
  split_transformers .............................. False
  synchronize_each_layer .......................... False
  tensor_model_parallel_size ...................... 4
  tensorboard_dir ................................. /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/tr4c-1B3-rotary-oscar-logs/tensorboard
  tensorboard_log_interval ........................ 1
  tensorboard_queue_size .......................... 5
  tile_factor ..................................... 1
  titles_data_path ................................ None
  tokenizer_name_or_path .......................... None
  tokenizer_type .................................. GPT2BPETokenizer
  train_iters ..................................... None
  train_samples ................................... 73242187
  use_checkpoint_lr_scheduler ..................... False
  use_contiguous_buffers_in_ddp ................... False
  use_cpu_initialization .......................... None
  use_one_sent_docs ............................... False
  use_pin_memory .................................. False
  virtual_pipeline_model_parallel_size ............ None
  vocab_extra_ids ................................. 0
  vocab_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-vocab.json
  weight_decay .................................... 0.1
  world_size ...................................... 64
  zero_allgather_bucket_size ...................... 0.0
  zero_contigious_gradients ....................... False
/bin/sh: line 0: type: git: not found
  zero_reduce_bucket_size ......................... 0.0
  zero_reduce_scatter ............................. False
  zero_stage ...................................... 1
-------------------- end of arguments ---------------------
will use batch size rampup starting from global batch size 32 to global batch size 512 with batch size increments 32 over 2000000 samples.
> building GPT2BPETokenizer tokenizer ...
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path ...............torch install path  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version torch version....................  ....................1.8.1 
1.8.1torch cuda version
 ...............torch cuda version  11.1...............
 nvcc version11.1 
..................... nvcc version11.2 
..................... deepspeed install path11.2 
........... deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
................... deepspeed info0.4.2+72ce55a, 72ce55a, big-science 
................... deepspeed wheel compiled w.0.4.2+72ce55a, 72ce55a, big-science 
...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 
...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
 > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688)
> initializing torch distributed ...
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
> setting tensorboard ...
> initializing tensor model parallel with size 4
> initializing pipeline model parallel with size 4
> setting random seeds to 1234 ...
[2021-09-27 16:59:15,029] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
> compiling dataset index builder ...
make: Entering directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data'
make: Nothing to be done for 'default'.
make: Leaving directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data'
>>> done with dataset index builder. Compilation time: 0.319 seconds
> compiling and loading fused kernels ...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_upper_triang_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_upper_triang_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module fused_mix_prec_layer_norm_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_mix_prec_layer_norm_cuda...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
>>> done with compiling and loading fused kernels. Compilation time: 20.113 seconds
time to initialize megatron (seconds): -40.411
[after megatron is initialized] datetime: 2021-09-27 16:59:35 
building GPT model ...
[2021-09-27 16:59:35,685] [INFO] [utils.py:680:see_memory_usage] Before Building Model
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved
  warnings.warn(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved
  warnings.warn(
[2021-09-27 16:59:35,687] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB         Max_MA 0.0 GB         CA 0.0 GB         Max_CA 0 GB 
[2021-09-27 16:59:35,688] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory:  used = 40.41 GB, percent = 21.6%
SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None
Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=1, data=0, model=0): 16, ProcessCoord(pipe=1, data=0, model=1): 17, ProcessCoord(pipe=1, data=0, model=2): 18, ProcessCoord(pipe=1, data=0, model=3): 19, ProcessCoord(pipe=1, data=1, model=0): 20, ProcessCoord(pipe=1, data=1, model=1): 21, ProcessCoord(pipe=1, data=1, model=2): 22, ProcessCoord(pipe=1, data=1, model=3): 23, ProcessCoord(pipe=1, data=2, model=0): 24, ProcessCoord(pipe=1, data=2, model=1): 25, ProcessCoord(pipe=1, data=2, model=2): 26, ProcessCoord(pipe=1, data=2, model=3): 27, ProcessCoord(pipe=1, data=3, model=0): 28, ProcessCoord(pipe=1, data=3, model=1): 29, ProcessCoord(pipe=1, data=3, model=2): 30, ProcessCoord(pipe=1, data=3, model=3): 31, ProcessCoord(pipe=2, data=0, model=0): 32, ProcessCoord(pipe=2, data=0, model=1): 33, ProcessCoord(pipe=2, data=0, model=2): 34, ProcessCoord(pipe=2, data=0, model=3): 35, ProcessCoord(pipe=2, data=1, model=0): 36, ProcessCoord(pipe=2, data=1, model=1): 37, ProcessCoord(pipe=2, data=1, model=2): 38, ProcessCoord(pipe=2, data=1, model=3): 39, ProcessCoord(pipe=2, data=2, model=0): 40, ProcessCoord(pipe=2, data=2, model=1): 41, ProcessCoord(pipe=2, data=2, model=2): 42, ProcessCoord(pipe=2, data=2, model=3): 43, ProcessCoord(pipe=2, data=3, model=0): 44, ProcessCoord(pipe=2, data=3, model=1): 45, ProcessCoord(pipe=2, data=3, model=2): 46, ProcessCoord(pipe=2, data=3, model=3): 47, ProcessCoord(pipe=3, data=0, model=0): 48, ProcessCoord(pipe=3, data=0, model=1): 49, ProcessCoord(pipe=3, data=0, model=2): 50, ProcessCoord(pipe=3, data=0, model=3): 51, ProcessCoord(pipe=3, data=1, model=0): 52, ProcessCoord(pipe=3, data=1, model=1): 53, ProcessCoord(pipe=3, data=1, model=2): 54, ProcessCoord(pipe=3, data=1, model=3): 55, ProcessCoord(pipe=3, data=2, model=0): 56, ProcessCoord(pipe=3, data=2, model=1): 57, ProcessCoord(pipe=3, data=2, model=2): 58, ProcessCoord(pipe=3, data=2, model=3): 59, ProcessCoord(pipe=3, data=3, model=0): 60, ProcessCoord(pipe=3, data=3, model=1): 61, ProcessCoord(pipe=3, data=3, model=2): 62, ProcessCoord(pipe=3, data=3, model=3): 63}
[2021-09-27 16:59:36,209] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer
stage=0 layers=9
     0: _to_float16
     1: EmbeddingPipe
     2: <lambda>
     3: ParallelTransformerLayerPipe
     4: ParallelTransformerLayerPipe
     5: ParallelTransformerLayerPipe
     6: ParallelTransformerLayerPipe
     7: ParallelTransformerLayerPipe
     8: ParallelTransformerLayerPipe
stage=1 layers=6
     9: ParallelTransformerLayerPipe
    10: ParallelTransformerLayerPipe
    11: ParallelTransformerLayerPipe
    12: ParallelTransformerLayerPipe
    13: ParallelTransformerLayerPipe
    14: ParallelTransformerLayerPipe
stage=2 layers=6
    15: ParallelTransformerLayerPipe
    16: ParallelTransformerLayerPipe
    17: ParallelTransformerLayerPipe
    18: ParallelTransformerLayerPipe
    19: ParallelTransformerLayerPipe
    20: ParallelTransformerLayerPipe
stage=3 layers=10
    21: ParallelTransformerLayerPipe
    22: ParallelTransformerLayerPipe
    23: ParallelTransformerLayerPipe
    24: ParallelTransformerLayerPipe
    25: ParallelTransformerLayerPipe
    26: ParallelTransformerLayerPipe
    27: <lambda>
    28: MixedFusedLayerNorm
    29: EmbeddingPipe
    30: float16_to_fp32
  loss: CrossEntropy
 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 75592704


 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 101549056
 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 101549056
 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 101549056
 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 101544960 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 101544960

 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 101549056
 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 101544960
[2021-09-27 16:59:36,626] [INFO] [utils.py:680:see_memory_usage] After Building Model
[2021-09-27 16:59:36,627] [INFO] [utils.py:681:see_memory_usage] MA 0.2 GB         Max_MA 0.22 GB         CA 0.24 GB         Max_CA 0 GB 
[2021-09-27 16:59:36,627] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory:  used = 40.76 GB, percent = 21.8%
 > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 101544960
setting training iterations to 152972
> learning rate decay style: cosine
DeepSpeed is enabled.
[2021-09-27 16:59:36,649] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+72ce55a, git-hash=72ce55a, git-branch=big-science
[2021-09-27 16:59:36,718] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False
[2021-09-27 16:59:36,718] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer
[2021-09-27 16:59:36,718] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer
[2021-09-27 16:59:36,718] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam
[2021-09-27 16:59:36,718] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type=<class 'apex.optimizers.fused_adam.FusedAdam'>
[2021-09-27 16:59:36,718] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer
[2021-09-27 16:59:36,718] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000
[2021-09-27 16:59:36,718] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000
[2021-09-27 16:59:36,718] [INFO] [stage2.py:108:__init__] CPU Offload: False
[2021-09-27 16:59:36,718] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False
[2021-09-27 16:59:36,952] [INFO] [stage2.py:419:__init__] optimizer state initialized
[2021-09-27 16:59:36,952] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam
[2021-09-27 16:59:36,952] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler
[2021-09-27 16:59:36,952] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = <megatron.learning_rates.AnnealingLR object at 0x14cdc0260b50>
[2021-09-27 16:59:36,952] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)]
[2021-09-27 16:59:36,952] [INFO] [config.py:900:print] DeepSpeedEngine configuration:
[2021-09-27 16:59:36,953] [INFO] [config.py:904:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2021-09-27 16:59:36,953] [INFO] [config.py:904:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2021-09-27 16:59:36,953] [INFO] [config.py:904:print]   allreduce_always_fp32 ........ False
[2021-09-27 16:59:36,953] [INFO] [config.py:904:print]   amp_enabled .................. False
[2021-09-27 16:59:36,953] [INFO] [config.py:904:print]   amp_params ................... False
[2021-09-27 16:59:36,953] [INFO] [config.py:904:print]   checkpoint_tag_validation_enabled  True
[2021-09-27 16:59:36,953] [INFO] [config.py:904:print]   checkpoint_tag_validation_fail  False
[2021-09-27 16:59:36,953] [INFO] [config.py:904:print]   disable_allgather ............ False
[2021-09-27 16:59:36,953] [INFO] [config.py:904:print]   dump_state ................... False
[2021-09-27 16:59:36,953] [INFO] [config.py:904:print]   dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1}
[2021-09-27 16:59:36,953] [INFO] [config.py:904:print]   eigenvalue_enabled ........... False
[2021-09-27 16:59:36,953] [INFO] [config.py:904:print]   eigenvalue_gas_boundary_resolution  1
[2021-09-27 16:59:36,953] [INFO] [config.py:904:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2021-09-27 16:59:36,953] [INFO] [config.py:904:print]   eigenvalue_layer_num ......... 0
[2021-09-27 16:59:36,953] [INFO] [config.py:904:print]   eigenvalue_max_iter .......... 100
[2021-09-27 16:59:36,953] [INFO] [config.py:904:print]   eigenvalue_stability ......... 1e-06
[2021-09-27 16:59:36,953] [INFO] [config.py:904:print]   eigenvalue_tol ............... 0.01
[2021-09-27 16:59:36,953] [INFO] [config.py:904:print]   eigenvalue_verbose ........... False
[2021-09-27 16:59:36,953] [INFO] [config.py:904:print]   elasticity_enabled ........... False
[2021-09-27 16:59:36,953] [INFO] [config.py:904:print]   flops_profiler_config ........ {
    "enabled": false, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2021-09-27 16:59:36,953] [INFO] [config.py:904:print]   fp16_enabled ................. True
[2021-09-27 16:59:36,953] [INFO] [config.py:904:print]   fp16_mixed_quantize .......... False
[2021-09-27 16:59:36,953] [INFO] [config.py:904:print]   global_rank .................. 0
[2021-09-27 16:59:36,953] [INFO] [config.py:904:print]   gradient_accumulation_steps .. 16
[2021-09-27 16:59:36,953] [INFO] [config.py:904:print]   gradient_clipping ............ 1.0
[2021-09-27 16:59:36,953] [INFO] [config.py:904:print]   gradient_predivide_factor .... 1.0
[2021-09-27 16:59:36,953] [INFO] [config.py:904:print]   initial_dynamic_scale ........ 4096
[2021-09-27 16:59:36,953] [INFO] [config.py:904:print]   loss_scale ................... 0
[2021-09-27 16:59:36,953] [INFO] [config.py:904:print]   memory_breakdown ............. False
[2021-09-27 16:59:36,953] [INFO] [config.py:904:print]   optimizer_legacy_fusion ...... False
[2021-09-27 16:59:36,953] [INFO] [config.py:904:print]   optimizer_name ............... None
[2021-09-27 16:59:36,954] [INFO] [config.py:904:print]   optimizer_params ............. None
[2021-09-27 16:59:36,954] [INFO] [config.py:904:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2021-09-27 16:59:36,954] [INFO] [config.py:904:print]   pld_enabled .................. False
[2021-09-27 16:59:36,954] [INFO] [config.py:904:print]   pld_params ................... False
[2021-09-27 16:59:36,954] [INFO] [config.py:904:print]   prescale_gradients ........... False
[2021-09-27 16:59:36,954] [INFO] [config.py:904:print]   quantize_change_rate ......... 0.001
[2021-09-27 16:59:36,954] [INFO] [config.py:904:print]   quantize_groups .............. 1
[2021-09-27 16:59:36,954] [INFO] [config.py:904:print]   quantize_offset .............. 1000
[2021-09-27 16:59:36,954] [INFO] [config.py:904:print]   quantize_period .............. 1000
[2021-09-27 16:59:36,954] [INFO] [config.py:904:print]   quantize_rounding ............ 0
[2021-09-27 16:59:36,954] [INFO] [config.py:904:print]   quantize_start_bits .......... 16
[2021-09-27 16:59:36,954] [INFO] [config.py:904:print]   quantize_target_bits ......... 8
[2021-09-27 16:59:36,954] [INFO] [config.py:904:print]   quantize_training_enabled .... False
[2021-09-27 16:59:36,954] [INFO] [config.py:904:print]   quantize_type ................ 0
[2021-09-27 16:59:36,954] [INFO] [config.py:904:print]   quantize_verbose ............. False
[2021-09-27 16:59:36,954] [INFO] [config.py:904:print]   scheduler_name ............... None
[2021-09-27 16:59:36,954] [INFO] [config.py:904:print]   scheduler_params ............. None
[2021-09-27 16:59:36,954] [INFO] [config.py:904:print]   sparse_attention ............. None
[2021-09-27 16:59:36,954] [INFO] [config.py:904:print]   sparse_gradients_enabled ..... False
[2021-09-27 16:59:36,954] [INFO] [config.py:904:print]   steps_per_print .............. 2000
[2021-09-27 16:59:36,954] [INFO] [config.py:904:print]   tensorboard_enabled .......... False
[2021-09-27 16:59:36,954] [INFO] [config.py:904:print]   tensorboard_job_name ......... DeepSpeedJobName
[2021-09-27 16:59:36,954] [INFO] [config.py:904:print]   tensorboard_output_path ...... 
[2021-09-27 16:59:36,954] [INFO] [config.py:904:print]   train_batch_size ............. 512
[2021-09-27 16:59:36,954] [INFO] [config.py:904:print]   train_micro_batch_size_per_gpu  8
[2021-09-27 16:59:36,954] [INFO] [config.py:904:print]   use_quantizer_kernel ......... False
[2021-09-27 16:59:36,954] [INFO] [config.py:904:print]   wall_clock_breakdown ......... False
[2021-09-27 16:59:36,954] [INFO] [config.py:904:print]   world_size ................... 4
[2021-09-27 16:59:36,954] [INFO] [config.py:904:print]   zero_allow_untested_optimizer  False
[2021-09-27 16:59:36,954] [INFO] [config.py:904:print]   zero_config .................. {
    "stage": 1, 
    "contiguous_gradients": false, 
    "reduce_scatter": true, 
    "reduce_bucket_size": 5.000000e+08, 
    "allgather_partitions": true, 
    "allgather_bucket_size": 5.000000e+08, 
    "overlap_comm": false, 
    "load_from_fp32_weights": true, 
    "elastic_checkpoint": true, 
    "offload_param": null, 
    "offload_optimizer": null, 
    "sub_group_size": 1.000000e+09, 
    "prefetch_bucket_size": 5.000000e+07, 
    "param_persistence_threshold": 1.000000e+05, 
    "max_live_parameters": 1.000000e+09, 
    "max_reuse_distance": 1.000000e+09, 
    "gather_fp16_weights_on_model_save": false, 
    "ignore_unused_parameters": true, 
    "round_robin_gradients": false, 
    "legacy_stage1": false
}
[2021-09-27 16:59:36,954] [INFO] [config.py:904:print]   zero_enabled ................. True
[2021-09-27 16:59:36,954] [INFO] [config.py:904:print]   zero_optimization_stage ...... 1
[2021-09-27 16:59:36,954] [INFO] [config.py:906:print]   json = {
    "train_micro_batch_size_per_gpu": 8, 
    "train_batch_size": 512, 
    "gradient_clipping": 1.0, 
    "zero_optimization": {
        "stage": 1
    }, 
    "fp16": {
        "enabled": true, 
        "loss_scale": 0, 
        "loss_scale_window": 500, 
        "hysteresis": 2, 
        "min_loss_scale": 1, 
        "initial_scale_power": 12
    }, 
    "steps_per_print": 2.000000e+03, 
    "wall_clock_breakdown": false
}
[2021-09-27 16:59:36,955] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=16 micro_batch_size=8
[2021-09-27 16:59:37,244] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-27 16:59:37,244] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-27 16:59:37,244] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-27 16:59:37,245] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-27 16:59:37,245] [INFO] [engine.py:134:__init__] RANK=34 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-27 16:59:37,245] [INFO] [engine.py:134:__init__] RANK=32 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-27 16:59:37,245] [INFO] [engine.py:134:__init__] RANK=33 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-27 16:59:37,245] [INFO] [engine.py:134:__init__] RANK=18 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-27 16:59:37,245] [INFO] [engine.py:134:__init__] RANK=19 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-27 16:59:37,245] [INFO] [engine.py:134:__init__] RANK=16 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-27 16:59:37,245] [INFO] [engine.py:134:__init__] RANK=17 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-27 16:59:37,245] [INFO] [engine.py:134:__init__] RANK=35 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-27 16:59:37,245] [INFO] [engine.py:134:__init__] RANK=50 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-27 16:59:37,245] [INFO] [engine.py:134:__init__] RANK=48 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-27 16:59:37,245] [INFO] [engine.py:134:__init__] RANK=49 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-27 16:59:37,245] [INFO] [engine.py:134:__init__] RANK=51 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
WARNING: could not find the metadata file /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints 
    will not load any checkpoints and will start from random
[2021-09-27 16:59:37,354] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,354] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,354] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,354] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,354] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,354] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,354] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,354] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
[2021-09-27 16:59:37,353] [WARNING] [engine.py:1744:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint.
time (ms) | load-checkpoint: 7.79
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.62471936
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.62471936
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264estimated model parameters: 1.209483264estimated model parameters: 1.209483264


/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.62471936
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
estimated model parameters: 1.62471936
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.624784896estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")

/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.624784896
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.624784896
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.209483264estimated model parameters: 1.209483264

estimated model parameters: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
estimated model parameters: 1.209483264
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters: 1.624784896
estimated model parameters without embeddings: 1.2095488
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.624784896
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896estimated model parameters: 1.624784896
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")

estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.624784896
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488

estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488

estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.62471936
estimated model parameters: 1.209483264
estimated model parameters: 1.62471936
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.62471936estimated model parameters: 1.62471936estimated model parameters: 1.62471936estimated model parameters: 1.62471936


estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.62471936
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.62471936
estimated model parameters: 1.62471936
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.62471936
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264estimated model parameters: 1.209483264

estimated model parameters: 1.62471936
estimated model parameters: 1.62471936
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264


estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
[after model, optimizer, and learning rate scheduler are built] datetime: 2021-09-27 16:59:37 
> building train, validation, and test datasets ...
 > datasets target sizes (minimum size):
    train:      73242187
    validation: 7833600
    test:       51200
> building train, validation, and test datasets for GPT ...
 > building dataset index ...
    reading sizes...
    reading pointers...
    reading document index...
    creating numpy buffer of mmap...
    creating memory view of numpy buffer...
 > finished creating indexed dataset in 0.126223 seconds
    number of documents: 304230423
 > dataset split:
    train:
     document indices in [0, 288714672) total of 288714672 documents
    validation:
     document indices in [288714672, 303926193) total of 15211521 documents
    test:
     document indices in [303926193, 304230423) total of 304230 documents
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.309 seconds
    total number of samples: 131537224
    total number of epochs: 1
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.346 seconds
    total number of samples: 13854322
    total number of epochs: 2
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.056 seconds
    total number of samples: 137384
    total number of epochs: 1
> finished creating GPT datasets ...
[after dataloaders are built] datetime: 2021-09-27 16:59:43 
done with setup ...
training ...
time (ms) | model-and-optimizer-setup: 1772.09 | train/valid/test-data-iterators-setup: 5577.15
Number of parameters: 1.62471936 billionNumber of parameters: 1.62471936 billion

Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion

Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billion


Number of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billion

Number of parameters: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion

Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion

Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion

Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion

Number of parameters: 1.624784896 billionNumber of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billionNumber of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billionNumber of parameters without embeddings: 1.2095488 billion


Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion

Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters without embeddings: 1.209483264 billion

Number of parameters without embeddings: 1.2095488 billionNumber of parameters without embeddings: 1.2095488 billion

Number of parameters: 1.62471936 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billionNumber of parameters without embeddings: 1.2095488 billion

Number of parameters without embeddings: 1.2095488 billion
Number of parameters: 1.624784896 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion


Number of parameters without embeddings: 1.2095488 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters without embeddings: 1.2095488 billion
[before the start of training step] datetime: 2021-09-27 16:59:43 
[2021-09-27 16:59:43,832] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information
[2021-09-27 16:59:43,832] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False
[2021-09-27 16:59:43,832] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 24 total layers
[2021-09-27 16:59:43,832] [INFO] [checkpointing.py:415:forward] ----Synchronization False
[2021-09-27 16:59:43,832] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False
[Rank 49] (after 200 iterations) memory (MB) | allocated: 1430.91943359375 | max allocated: 3257.17822265625 | reserved: 5484.0 | max reserved: 5484.0
[Rank 50] (after 200 iterations) memory (MB) | allocated: 1430.91943359375 | max allocated: 3257.17822265625 | reserved: 5264.0 | max reserved: 5264.0
[Rank 33] (after 200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2410.7431640625 | reserved: 3708.0 | max reserved: 3708.0
[Rank 17] (after 200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2410.7431640625 | reserved: 3772.0 | max reserved: 3772.0
[Rank 1] (after 200 iterations) memory (MB) | allocated: 514.96142578125 | max allocated: 2555.3828125 | reserved: 3974.0 | max reserved: 3974.0
[Rank 34] (after 200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2410.7431640625 | reserved: 3724.0 | max reserved: 3724.0
[Rank 3] (after 200 iterations) memory (MB) | allocated: 514.96142578125 | max allocated: 2555.3828125 | reserved: 3974.0 | max reserved: 3974.0
[Rank 35] (after 200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2410.7431640625 | reserved: 3724.0 | max reserved: 3724.0
[Rank 18] (after 200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2410.7431640625 | reserved: 3868.0 | max reserved: 3868.0
[Rank 51] (after 200 iterations) memory (MB) | allocated: 1430.91943359375 | max allocated: 3257.17822265625 | reserved: 5184.0 | max reserved: 5184.0
[Rank 19] (after 200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2410.7431640625 | reserved: 3772.0 | max reserved: 3772.0
[Rank 2] (after 200 iterations) memory (MB) | allocated: 514.96142578125 | max allocated: 2555.3828125 | reserved: 4118.0 | max reserved: 4118.0
[Rank 16] (after 200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2410.7431640625 | reserved: 3772.0 | max reserved: 3772.0
[Rank 0] (after 200 iterations) memory (MB) | allocated: 514.96142578125 | max allocated: 2555.3828125 | reserved: 4150.0 | max reserved: 4150.0
[Rank 32] (after 200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2410.7431640625 | reserved: 3804.0 | max reserved: 3804.0
[Rank 48] (after 200 iterations) memory (MB) | allocated: 1430.91943359375 | max allocated: 3257.17822265625 | reserved: 6056.0 | max reserved: 6056.0
 iteration      200/  152972 | consumed samples:         6400 | elapsed time per iteration (ms): 1327.5 | learning rate: 6.991E-06 | global batch size:    32 | lm loss: 8.445860E+00 | loss scale: 4096.0 | grad norm: 5217.550 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      400/  152972 | consumed samples:        12800 | elapsed time per iteration (ms): 1259.8 | learning rate: 1.398E-05 | global batch size:    32 | lm loss: 6.949808E+00 | loss scale: 4096.0 | grad norm: 7072.401 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      600/  152972 | consumed samples:        19200 | elapsed time per iteration (ms): 1260.4 | learning rate: 2.097E-05 | global batch size:    32 | lm loss: 6.509160E+00 | loss scale: 8192.0 | grad norm: 9807.453 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration      800/  152972 | consumed samples:        25600 | elapsed time per iteration (ms): 1260.0 | learning rate: 2.796E-05 | global batch size:    32 | lm loss: 6.201863E+00 | loss scale: 8192.0 | grad norm: 7757.490 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1000/  152972 | consumed samples:        32000 | elapsed time per iteration (ms): 1258.3 | learning rate: 3.495E-05 | global batch size:    32 | lm loss: 5.958127E+00 | loss scale: 16384.0 | grad norm: 8164.202 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
------------------------------------------------------------------------------------------------
 validation loss at iteration 1000 | lm loss value: 5.786907E+00 | lm loss PPL: 3.260030E+02 | 
------------------------------------------------------------------------------------------------
 iteration     1200/  152972 | consumed samples:        38400 | elapsed time per iteration (ms): 1415.3 | learning rate: 4.194E-05 | global batch size:    32 | lm loss: 5.749456E+00 | loss scale: 16384.0 | grad norm: 16830.504 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1400/  152972 | consumed samples:        44800 | elapsed time per iteration (ms): 1258.8 | learning rate: 4.893E-05 | global batch size:    32 | lm loss: 5.540604E+00 | loss scale: 16384.0 | grad norm: 14275.904 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration    1500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-09-27 17:31:58,218] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step1500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration    1500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1572.29
 iteration     1600/  152972 | consumed samples:        51200 | elapsed time per iteration (ms): 1269.7 | learning rate: 5.592E-05 | global batch size:    32 | lm loss: 5.372899E+00 | loss scale: 32768.0 | grad norm: 23634.576 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     1800/  152972 | consumed samples:        57600 | elapsed time per iteration (ms): 1261.8 | learning rate: 6.291E-05 | global batch size:    32 | lm loss: 5.217889E+00 | loss scale: 32768.0 | grad norm: 21545.590 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-27 17:42:30,184] [INFO] [logging.py:68:log_dist] [Rank 0] step=2000, skipped=2, lr=[6.983534037847136e-05, 6.983534037847136e-05], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 2000 loss: 4.9108 iter time (s): 0.001 samples/sec: 50939.574
 iteration     2000/  152972 | consumed samples:        64000 | elapsed time per iteration (ms): 1260.5 | learning rate: 6.984E-05 | global batch size:    32 | lm loss: 5.363922E+00 | loss scale: 16384.0 | grad norm: 12768.459 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
------------------------------------------------------------------------------------------------
 validation loss at iteration 2000 | lm loss value: 4.962508E+00 | lm loss PPL: 1.429518E+02 | 
------------------------------------------------------------------------------------------------
 iteration     2200/  152972 | consumed samples:        70400 | elapsed time per iteration (ms): 1407.3 | learning rate: 7.683E-05 | global batch size:    32 | lm loss: 4.894614E+00 | loss scale: 16384.0 | grad norm: 9693.687 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2400/  152972 | consumed samples:        76800 | elapsed time per iteration (ms): 1263.8 | learning rate: 8.382E-05 | global batch size:    32 | lm loss: 4.742365E+00 | loss scale: 16384.0 | grad norm: 11512.744 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2600/  152972 | consumed samples:        83200 | elapsed time per iteration (ms): 1265.1 | learning rate: 9.081E-05 | global batch size:    32 | lm loss: 4.640353E+00 | loss scale: 32768.0 | grad norm: 16408.451 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     2800/  152972 | consumed samples:        89600 | elapsed time per iteration (ms): 1268.9 | learning rate: 9.780E-05 | global batch size:    32 | lm loss: 4.562429E+00 | loss scale: 32768.0 | grad norm: 17465.135 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3000/  152972 | consumed samples:        96000 | elapsed time per iteration (ms): 1272.7 | learning rate: 1.048E-04 | global batch size:    32 | lm loss: 4.480088E+00 | loss scale: 65536.0 | grad norm: 29013.886 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
------------------------------------------------------------------------------------------------
 validation loss at iteration 3000 | lm loss value: 4.390939E+00 | lm loss PPL: 8.071619E+01 | 
------------------------------------------------------------------------------------------------
saving checkpoint at iteration    3000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-09-27 18:04:34,840] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step3000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration    3000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1703.84
 iteration     3200/  152972 | consumed samples:       102400 | elapsed time per iteration (ms): 1417.8 | learning rate: 1.118E-04 | global batch size:    32 | lm loss: 4.428154E+00 | loss scale: 65536.0 | grad norm: 27260.674 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3400/  152972 | consumed samples:       108800 | elapsed time per iteration (ms): 1264.6 | learning rate: 1.188E-04 | global batch size:    32 | lm loss: 4.375950E+00 | loss scale: 65536.0 | grad norm: 30398.829 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3600/  152972 | consumed samples:       115200 | elapsed time per iteration (ms): 1269.6 | learning rate: 1.258E-04 | global batch size:    32 | lm loss: 4.317261E+00 | loss scale: 131072.0 | grad norm: 77605.844 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3800/  152972 | consumed samples:       121600 | elapsed time per iteration (ms): 1268.3 | learning rate: 1.327E-04 | global batch size:    32 | lm loss: 4.276650E+00 | loss scale: 131072.0 | grad norm: 51425.567 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-27 18:25:43,201] [INFO] [logging.py:68:log_dist] [Rank 0] step=4000, skipped=4, lr=[0.00013967068075694273, 0.00013967068075694273], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 4000 loss: 4.2108 iter time (s): 0.001 samples/sec: 50745.813
 iteration     4000/  152972 | consumed samples:       128000 | elapsed time per iteration (ms): 1267.0 | learning rate: 1.397E-04 | global batch size:    32 | lm loss: 4.234697E+00 | loss scale: 65536.0 | grad norm: 24346.811 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
------------------------------------------------------------------------------------------------
 validation loss at iteration 4000 | lm loss value: 4.166348E+00 | lm loss PPL: 6.447954E+01 | 
------------------------------------------------------------------------------------------------
 iteration     4200/  152972 | consumed samples:       135456 | elapsed time per iteration (ms): 1475.5 | learning rate: 1.477E-04 | global batch size:    64 | lm loss: 4.958833E+00 | loss scale: 16384.0 | grad norm: 16732.350 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4400/  152972 | consumed samples:       148256 | elapsed time per iteration (ms): 1663.9 | learning rate: 1.617E-04 | global batch size:    64 | lm loss: 5.272735E+00 | loss scale: 16384.0 | grad norm: 4236.354 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  File "/gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/pretrain_gpt.py", line 229, in <module>
  File "/gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/pretrain_gpt.py", line 229, in <module>
  File "/gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/pretrain_gpt.py", line 229, in <module>
    pretrain(train_valid_test_datasets_provider, model_provider, forward_step,
    pretrain(train_valid_test_datasets_provider, model_provider, forward_step,  File "/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/training.py", line 149, in pretrain

  File "/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/training.py", line 149, in pretrain
    pretrain(train_valid_test_datasets_provider, model_provider, forward_step,
  File "/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/training.py", line 149, in pretrain
Traceback (most recent call last):
  File "/gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/pretrain_gpt.py", line 229, in <module>
    pretrain(train_valid_test_datasets_provider, model_provider, forward_step,
  File "/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/training.py", line 149, in pretrain
    iteration = train(forward_step_func,
  File "/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/training.py", line 692, in train
    iteration = train(forward_step_func,
  File "/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/training.py", line 692, in train
    iteration = train(forward_step_func,
  File "/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/training.py", line 692, in train
    train_step(forward_step_func,
  File "/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/training.py", line 389, in train_step
    train_step(forward_step_func,
  File "/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/training.py", line 389, in train_step
    train_step(forward_step_func,
      File "/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/training.py", line 389, in train_step
iteration = train(forward_step_func,
  File "/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/training.py", line 692, in train
    loss = model[0].train_batch(data_iter=data_iterator)
      File "/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed/runtime/pipe/engine.py", line 291, in train_batch
loss = model[0].train_batch(data_iter=data_iterator)
  File "/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed/runtime/pipe/engine.py", line 291, in train_batch
    loss = model[0].train_batch(data_iter=data_iterator)
  File "/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed/runtime/pipe/engine.py", line 291, in train_batch
    train_step(forward_step_func,
  File "/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/training.py", line 389, in train_step
    self._exec_schedule(sched)    
self._exec_schedule(sched)
  File "/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed/runtime/pipe/engine.py", line 1237, in _exec_schedule
  File "/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed/runtime/pipe/engine.py", line 1237, in _exec_schedule
    self._exec_schedule(sched)
  File "/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed/runtime/pipe/engine.py", line 1237, in _exec_schedule
    loss = model[0].train_batch(data_iter=data_iterator)
  File "/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed/runtime/pipe/engine.py", line 291, in train_batch
    self._exec_schedule(sched)
  File "/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed/runtime/pipe/engine.py", line 1237, in _exec_schedule
    self._exec_instr(**cmd.kwargs)
  File "/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed/runtime/pipe/engine.py", line 895, in _exec_send_grads
    self._exec_instr(**cmd.kwargs)
  File "/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed/runtime/pipe/engine.py", line 895, in _exec_send_grads
    self._exec_instr(**cmd.kwargs)
  File "/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed/runtime/pipe/engine.py", line 895, in _exec_send_grads
    self._exec_instr(**cmd.kwargs)
  File "/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed/runtime/pipe/engine.py", line 895, in _exec_send_grads
    inputs = tuple([part.to_meta(), part.data()])
  File "/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed/runtime/utils.py", line 612, in to_meta
    inputs = tuple([part.to_meta(), part.data()])
  File "/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed/runtime/utils.py", line 612, in to_meta
    inputs = tuple([part.to_meta(), part.data()])
  File "/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed/runtime/utils.py", line 612, in to_meta
    inputs = tuple([part.to_meta(), part.data()])
  File "/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed/runtime/utils.py", line 612, in to_meta
    return torch.LongTensor(data=meta).to(self.orig_device)
    return torch.LongTensor(data=meta).to(self.orig_device)
RuntimeError: CUDA error: unknown error
RuntimeError: CUDA error: unknown error
    return torch.LongTensor(data=meta).to(self.orig_device)
    return torch.LongTensor(data=meta).to(self.orig_device)
RuntimeError: CUDA error: unknown error
RuntimeError: CUDA error: unknown error
terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: unknown error
Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1616554793803/work/c10/cuda/CUDACachingAllocator.cpp:733 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x150a1bd182f2 in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5b (0x150a1bd1567b in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x809 (0x150a1bf71219 in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0x54 (0x150a1bd003a4 in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #4: <unknown function> + 0x6e0e5a (0x150a72c76e5a in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #5: <unknown function> + 0x6e0ef1 (0x150a72c76ef1 in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #6: <unknown function> + 0x1932c6 (0x55900bc892c6 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #7: <unknown function> + 0x1592ac (0x55900bc4f2ac in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #8: <unknown function> + 0x158e77 (0x55900bc4ee77 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #9: <unknown function> + 0x158e60 (0x55900bc4ee60 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #10: <unknown function> + 0x158e60 (0x55900bc4ee60 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #11: <unknown function> + 0x176057 (0x55900bc6c057 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #12: PyDict_SetItemString + 0x61 (0x55900bc8d3c1 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #13: PyImport_Cleanup + 0x9d (0x55900bccbaad in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #14: Py_FinalizeEx + 0x79 (0x55900bcfda49 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #15: Py_RunMain + 0x183 (0x55900bcff893 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #16: Py_BytesMain + 0x39 (0x55900bcffca9 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #17: __libc_start_main + 0xf3 (0x150aa3ca6873 in /lib64/libc.so.6)
frame #18: <unknown function> + 0x1e21c7 (0x55900bcd81c7 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)

terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: unknown error
Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1616554793803/work/c10/cuda/CUDACachingAllocator.cpp:733 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x149f6e7662f2 in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5b (0x149f6e76367b in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x809 (0x149f6e9bf219 in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0x54 (0x149f6e74e3a4 in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #4: <unknown function> + 0x6e0e5a (0x149fc56c4e5a in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #5: <unknown function> + 0x6e0ef1 (0x149fc56c4ef1 in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #6: <unknown function> + 0x1932c6 (0x562cda94a2c6 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #7: <unknown function> + 0x1592ac (0x562cda9102ac in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #8: <unknown function> + 0x158e77 (0x562cda90fe77 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #9: <unknown function> + 0x158e60 (0x562cda90fe60 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #10: <unknown function> + 0x158e60 (0x562cda90fe60 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #11: <unknown function> + 0x176057 (0x562cda92d057 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #12: PyDict_SetItemString + 0x61 (0x562cda94e3c1 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #13: PyImport_Cleanup + 0x9d (0x562cda98caad in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #14: Py_FinalizeEx + 0x79 (0x562cda9bea49 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #15: Py_RunMain + 0x183 (0x562cda9c0893 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #16: Py_BytesMain + 0x39 (0x562cda9c0ca9 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #17: __libc_start_main + 0xf3 (0x149ff66fa873 in /lib64/libc.so.6)
frame #18: <unknown function> + 0x1e21c7 (0x562cda9991c7 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)

terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: unknown error
Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1616554793803/work/c10/cuda/CUDACachingAllocator.cpp:733 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x14852245a2f2 in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5b (0x14852245767b in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x809 (0x1485226b3219 in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0x54 (0x1485224423a4 in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #4: <unknown function> + 0x6e0e5a (0x1485793b8e5a in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #5: <unknown function> + 0x6e0ef1 (0x1485793b8ef1 in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #6: <unknown function> + 0x1932c6 (0x5652b07982c6 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #7: <unknown function> + 0x1592ac (0x5652b075e2ac in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #8: <unknown function> + 0x158e77 (0x5652b075de77 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #9: <unknown function> + 0x158e60 (0x5652b075de60 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #10: <unknown function> + 0x158e60 (0x5652b075de60 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #11: <unknown function> + 0x176057 (0x5652b077b057 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #12: PyDict_SetItemString + 0x61 (0x5652b079c3c1 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #13: PyImport_Cleanup + 0x9d (0x5652b07daaad in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #14: Py_FinalizeEx + 0x79 (0x5652b080ca49 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #15: Py_RunMain + 0x183 (0x5652b080e893 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #16: Py_BytesMain + 0x39 (0x5652b080eca9 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #17: __libc_start_main + 0xf3 (0x1485aa3ee873 in /lib64/libc.so.6)
frame #18: <unknown function> + 0x1e21c7 (0x5652b07e71c7 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)

terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: unknown error
Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1616554793803/work/c10/cuda/CUDACachingAllocator.cpp:733 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x14642f8392f2 in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5b (0x14642f83667b in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x809 (0x14642fa92219 in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0x54 (0x14642f8213a4 in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #4: <unknown function> + 0x6e0e5a (0x146486797e5a in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #5: <unknown function> + 0x6e0ef1 (0x146486797ef1 in /gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #6: <unknown function> + 0x1932c6 (0x562b4161d2c6 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #7: <unknown function> + 0x1592ac (0x562b415e32ac in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #8: <unknown function> + 0x158e77 (0x562b415e2e77 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #9: <unknown function> + 0x158e60 (0x562b415e2e60 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #10: <unknown function> + 0x158e60 (0x562b415e2e60 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #11: <unknown function> + 0x176057 (0x562b41600057 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #12: PyDict_SetItemString + 0x61 (0x562b416213c1 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #13: PyImport_Cleanup + 0x9d (0x562b4165faad in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #14: Py_FinalizeEx + 0x79 (0x562b41691a49 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #15: Py_RunMain + 0x183 (0x562b41693893 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #16: Py_BytesMain + 0x39 (0x562b41693ca9 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)
frame #17: __libc_start_main + 0xf3 (0x1464b77cd873 in /lib64/libc.so.6)
frame #18: <unknown function> + 0x1e21c7 (0x562b4166c1c7 in /gpfswork/rech/six/commun/conda/hf-prod/bin/python)

srun: error: Node failure on r13i7n1
slurmstepd: error: *** STEP 1269478.0 ON r13i2n6 CANCELLED AT 2021-09-27T18:57:05 DUE TO NODE FAILURE, SEE SLURMCTLD LOG FOR DETAILS ***
Killing subprocess 2366
Killing subprocess 29626
Killing subprocess 37605
Killing subprocess 2367
Killing subprocess 2368
Killing subprocess 29627
Killing subprocess 6748
Killing subprocess 37606
Killing subprocess 2369
Killing subprocess 29628
Killing subprocess 37607
Killing subprocess 6749
Killing subprocess 44871
Killing subprocess 29629
Killing subprocess 66392
Killing subprocess 6750
Killing subprocess 37608
Killing subprocess 44872
Killing subprocess 66393
Killing subprocess 6751
Main process received SIGTERM, exiting
Killing subprocess 21897
Main process received SIGTERM, exiting
Killing subprocess 44873
Killing subprocess 44874
Killing subprocess 66394
Killing subprocess 66395
Main process received SIGTERM, exiting
Killing subprocess 21898
Killing subprocess 48164
Killing subprocess 1227
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 21173
Killing subprocess 57173
Killing subprocess 21899
Killing subprocess 80659
Killing subprocess 80592
Main process received SIGTERM, exiting
Killing subprocess 21900
Killing subprocess 48165
Killing subprocess 1228
Main process received SIGTERM, exiting
Killing subprocess 21174
Killing subprocess 57174
Killing subprocess 80660
Killing subprocess 48166
Killing subprocess 21175
Killing subprocess 80593
Killing subprocess 1229
Killing subprocess 48167
Killing subprocess 80594
Killing subprocess 1230
Killing subprocess 80661
Killing subprocess 57175
Killing subprocess 21176
Killing subprocess 57176
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 3693
Killing subprocess 80595
Killing subprocess 80662
Main process received SIGTERM, exiting
Main process received SIGTERM, exiting
Killing subprocess 3694
Main process received SIGTERM, exiting
Killing subprocess 3695
Main process received SIGTERM, exiting
Killing subprocess 3696
Main process received SIGTERM, exiting
Killing subprocess 75871
Killing subprocess 75872
Killing subprocess 75873
Killing subprocess 75874
Main process received SIGTERM, exiting
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
2021-09-27 18:58:08.525097: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:08.525171: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:08.535366: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:08.535529: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:08.535673: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:08.535729: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:08.543212: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:08.559916: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:08.560186: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:08.560255: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:08.560260: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:08.583050: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:08.583232: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:08.583289: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:08.611367: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:08.611407: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:08.611440: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:08.611448: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:08.611544: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
2021-09-27 18:58:08.622505: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
2021-09-27 18:58:08.629023: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:08.629032: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:08.697993: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:08.711851: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:11.858068: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:11.858071: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:11.858073: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:11.858077: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:11.988428: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:11.988426: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:11.988422: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:11.988437: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:12.053774: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:12.053776: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:12.053774: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:12.053784: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:12.061725: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:12.061726: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:12.061727: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:12.061721: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:12.062945: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:12.062947: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:12.062960: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:12.062953: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:12.094458: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:12.094467: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:12.094465: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:12.094469: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:12.101070: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:12.101079: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:12.101075: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:12.101077: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:12.148747: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:12.148750: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:12.148747: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:12.148755: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:12.218404: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:12.218400: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:12.218414: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:12.218412: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:12.219539: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:12.219543: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:12.219551: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-27 18:58:12.219554: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

DeepSpeed C++/CUDA extension op report--------------------------------------------------

JIT compiled ops requires ninja--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------
DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------
JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninja  ....................................  [92m[OKAY][0m[92m[OKAY][0m
ninja .................. [92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
op nameop name  ................................  installedinstalled  ....  compatiblecompatible

----------------------------------------------------------------------------------------------------

op name ................ installed .. compatible
--------------------------------------------------
cpu_adamcpu_adam  ..............................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam fused_adam.............  .............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
fused_lambfused_lamb  ..........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
sparse_attn sparse_attn............  ............[93m[NO][0m  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
op name ................sparse_attn  installed............  [93m[NO][0m..  .......compatible 
[92m[OKAY][0m
--------------------------------------------------
transformer ............transformer  [93m[NO][0m............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
transformer ............ ninja[93m[NO][0m  ....... ..................cpu_adam[92m[OKAY][0m  
[92m[OKAY][0m...............
stochastic_transformer . stochastic_transformer[93m[NO][0m  ........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
 stochastic_transformer--------------------------------------------------[92m[YES][0m 
 ....... op name  [93m[NO][0m................[92m[OKAY][0m  
.......installed  [92m[OKAY][0m
.. compatible
fused_adam-------------------------------------------------- 
............. [93m[NO][0m ....... [92m[OKAY][0m
ninjaninja  ....................................  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

cpu_adamfused_lamb  ............................  [92m[YES][0m[93m[NO][0m  .............  [92m[OKAY][0m[92m[OKAY][0m

op nameop name  ................................  installedinstalled  ....  compatiblecompatible

----------------------------------------------------------------------------------------------------

fused_adam .............sparse_attn  [93m[NO][0m............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
cpu_adamcpu_adam  ..............................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

fused_lambtransformer  .........................  [93m[NO][0m[93m[NO][0m  .............. [92m[OKAY][0m 
[92m[OKAY][0m
fused_adamfused_adam  .......................... [93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
stochastic_transformer . [93m[NO][0m .......sparse_attn  [92m[OKAY][0m............
fused_lambfused_lamb  ..........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

 [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attnsparse_attn  ........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

ninja ..................stochastic_transformer  [92m[OKAY][0m
. --------------------------------------------------[93m[NO][0m
 op name.......  ................[92m[OKAY][0m 
transformertransformer  ........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

stochastic_transformerstochastic_transformer  ..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................cpu_adam  installed .. compatible...............
 --------------------------------------------------[92m[YES][0m
 ...... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m fused_adam ....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0mfused_lamb
 ............. fused_lamb[93m[NO][0m  ....................  [93m[NO][0m ....... [92m[OKAY][0m[92m[OKAY][0m

sparse_attn ............ [93m[NO][0msparse_attn  ...................  [92m[OKAY][0m[93m[NO][0m
 .......transformer  [92m[OKAY][0m............
 [93m[NO][0m transformer.......  ............[92m[OKAY][0m 
[93m[NO][0m ....... stochastic_transformer .[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0mstochastic_transformer
 . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
ninjaninja ..................  ..................[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
op name op name................  ................installed  ..installed  compatible..
 compatible--------------------------------------------------

transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... cpu_adam[92m[YES][0m  .....................  [92m[YES][0m [92m[OKAY][0m......
 [92m[OKAY][0m
fused_adam .............fused_adam  [93m[NO][0m.............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
fused_lamb ............. fused_lamb[93m[NO][0m  ....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
sparse_attnsparse_attn  ........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

transformertransformer  ........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

stochastic_transformerstochastic_transformer  ..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

ninja .................. ninja[92m[OKAY][0mninja 
.................. --------------------------------------------------..................
  [92m[OKAY][0m[92m[OKAY][0mop name

 ................---------------------------------------------------------------------------------------------------- 

installedop nameop name   ..................................   compatibleinstalledinstalled 
 .... -------------------------------------------------- compatible
compatible

----------------------------------------------------------------------------------------------------

cpu_adam ............... [92m[YES][0mcpu_adamcpu_adam   ....................................  [92m[YES][0m [92m[OKAY][0m [92m[YES][0m
......  ......[92m[OKAY][0m 
[92m[OKAY][0m
fused_adam ............. fused_adam[93m[NO][0mfused_adam   .................... .............[93m[NO][0m  [92m[OKAY][0m [93m[NO][0m
.......  .......[92m[OKAY][0m 
[92m[OKAY][0mfused_lamb
 fused_lamb.............  fused_lamb.............[93m[NO][0m   .............[93m[NO][0m.......   [93m[NO][0m.......[92m[OKAY][0m  
.......[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attnsparse_attn  ........................sparse_attn   [93m[NO][0m[93m[NO][0m............   ..............[93m[NO][0m   [92m[OKAY][0m[92m[OKAY][0m.......
 
[92m[OKAY][0m
transformer transformer............transformer   ............[93m[NO][0m............   [93m[NO][0m[93m[NO][0m.......   .......[92m[OKAY][0m....... 
 [92m[OKAY][0m[92m[OKAY][0m

stochastic_transformer .stochastic_transformer  stochastic_transformer[93m[NO][0m.   .......[93m[NO][0m .  [92m[OKAY][0m.......
[93m[NO][0m [92m[OKAY][0m 
....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
transformer_inference .. transformer_inference[93m[NO][0m  ......... [93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  .............. [92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m async_io.......  ...............[93m[NO][0m 
[93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... utils[92m[OKAY][0m 
.................. [92m[YES][0m ...... utils[92m[OKAY][0m
 .................. [92m[YES][0m quantizer......  ..............[92m[OKAY][0m 
[93m[NO][0m .......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
async_io ...............utils  [93m[NO][0m..................  .......[92m[YES][0m  [93m[NO][0m......
 [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... transformer_inference[92m[OKAY][0m 
.. [93m[NO][0m --------------------------------------------------.......
 [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install pathDeepSpeed general environment info: ............... 
torch install path ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']...............
 torch version .................... 1.8.1
torch cuda version ...............['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 
11.1
nvcc versiontorch version  .........................................  11.21.8.1

deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']torch cuda version
 deepspeed info...............  ................... 0.4.2+72ce55a, 72ce55a, big-science
11.1deepspeed wheel compiled w.
 nvcc version......  .....................torch 1.8, cuda 11.1 
11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch version torch install path....................  ...............1.8.1 
torch cuda version ............... 11.1
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']nvcc version
 ..................... torch version11.2 
....................deepspeed install path  1.8.1...........
 torch cuda version['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
...............deepspeed info 11.1 
...................nvcc version  0.4.2+72ce55a, 72ce55a, big-science.....................
 deepspeed wheel compiled w.11.2 
......deepspeed install path  torch 1.8, cuda 11.1...........
 ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninja
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
ninja .................. [92m[OKAY][0m
cpu_adam --------------------------------------------------............... 
[92m[YES][0m op name ...... ................[92m[OKAY][0m 
installed .. compatible
--------------------------------------------------
fused_adam ............. [93m[NO][0m .......cpu_adam  [92m[OKAY][0m...............
 [92m[YES][0m ...... fused_lamb[92m[OKAY][0m 
............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... ninja[92m[OKAY][0m
 sparse_attn.................. fused_lamb ............ [92m[OKAY][0m .............
[93m[NO][0m  --------------------------------------------------[93m[NO][0m.......
  [92m[OKAY][0mop name
.......  ................[92m[OKAY][0mtransformerninja 
 installed ............  ..[93m[NO][0m..................   .......compatible[92m[OKAY][0m 

sparse_attn[92m[OKAY][0m-------------------------------------------------- 

............--------------------------------------------------
 stochastic_transformerop name[93m[NO][0m  .  cpu_adam.......................[93m[NO][0m   ....... ...............[92m[OKAY][0m  [92m[OKAY][0m
installed[92m[YES][0m
transformer ............   ........[93m[NO][0m   compatible[92m[OKAY][0m.......

 --------------------------------------------------[92m[OKAY][0m

stochastic_transformer fused_adam.  .............[93m[NO][0m  [93m[NO][0m.......cpu_adam  .......[92m[OKAY][0m 
[92m[OKAY][0m 
............... [92m[YES][0m ...... fused_lamb ............. [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
sparse_attnfused_adam  ............ [93m[NO][0m ....................  [93m[NO][0m[92m[OKAY][0m .......
 [92m[OKAY][0m
transformer ............ [93m[NO][0mfused_lamb  ....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0mstochastic_transformer
 . [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path ............... 
torch install path['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
 ............... torch version .................... 1.8.1
torch cuda version['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 
............... 11.1torch version
 nvcc version....................  .....................1.8.1 
11.2
deepspeed install pathtorch cuda version  ..........................  11.1
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']nvcc version
 deepspeed info.....................  ...................11.2 
0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed install path
 deepspeed wheel compiled w............  ......['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
torch 1.8, cuda 11.1
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m .......utils [92m[OKAY][0m
-------------------------------------------------- 
.................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.--------------------------------------------------

async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

------------------------------------------------------------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report


------------------------------------------------------------------------------------------------------------------------------------------------------


JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja


--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
ninja--------------------------------------------------
 .................. [92m[OKAY][0m
--------------------------------------------------
op namecpu_adam  ...............................  installed[92m[YES][0m  ........  compatible[92m[OKAY][0m

--------------------------------------------------
cpu_adamfused_adam  ...............ninja.............ninja   [93m[NO][0m [92m[YES][0m.................. .........................    ......[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m 


[92m[OKAY][0m
----------------------------------------------------------------------------------------------------fused_lamb
 
.............op name op name [93m[NO][0mfused_adam................    ....................................  installed [92m[OKAY][0minstalled 
 [93m[NO][0m.. .. .......compatible  
[92m[OKAY][0mcompatible--------------------------------------------------


sparse_attn-------------------------------------------------- ............
fused_lamb  [93m[NO][0mcpu_adam.............   .......[93m[NO][0m...............  cpu_adam .......[92m[OKAY][0m  
...............[92m[YES][0m[92m[OKAY][0mtransformer 
  ..................[92m[YES][0m   [92m[OKAY][0m[93m[NO][0m......
  .......[92m[OKAY][0m sparse_attn[92m[OKAY][0m 

............fused_adam [93m[NO][0mstochastic_transformer   .....................   fused_adam[93m[NO][0m[92m[OKAY][0m[93m[NO][0m  
....... .............transformer .......  [92m[OKAY][0m 
............[93m[NO][0m[92m[OKAY][0m  
[93m[NO][0mfused_lamb.......  ....... ............. [92m[OKAY][0m [92m[OKAY][0m
[93m[NO][0m
 fused_lamb.......  .............[92m[OKAY][0mstochastic_transformer 
 [93m[NO][0m.  .......[93m[NO][0m  .......[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attntransformer  ........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

transformer stochastic_transformer............  .[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m utils.......  ..................[92m[OKAY][0m 
[92m[YES][0m ...... [92m[OKAY][0m
utils quantizer..................  ..............[92m[YES][0m  [93m[NO][0m......  .......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer-------------------------------------------------- 
.............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version torch version....................  ....................1.8.1 
1.8.1
torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ...................0.4.2+72ce55a, 72ce55a, big-science 
0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w.
 ......deepspeed wheel compiled w.  torch 1.8, cuda 11.1......
 torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja


JIT compiled ops requires ninja--------------------------------------------------
--------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

------------------------------------------------------------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report


--------------------------------------------------
DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


------------------------------------------------------------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja


------------------------------------------------------------------------------------------------------------------------------------------------------


JIT compiled ops requires ninjaJIT compiled ops requires ninjaJIT compiled ops requires ninja


ninjaninjaninjaninja    ......................................................  .................. [92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m


ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


[92m[OKAY][0m----------------------------------------------------------------------------------------------------
--------------------------------------------------


--------------------------------------------------
op nameop nameop name op name  ................................  ................ ................installed installed  installed.. installed   ..compatible.. 
.. compatible-------------------------------------------------- 

compatible--------------------------------------------------compatible


--------------------------------------------------
--------------------------------------------------
cpu_adam ............... [92m[YES][0mcpu_adam  cpu_adam.....................cpu_adam   ...............[92m[YES][0m [92m[OKAY][0m  ...............
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------

op name
 op nameop name op name................ ................  installed................  ................installed ..  installed  compatible..installed..
...... [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

 [92m[OKAY][0m
   compatible--------------------------------------------------..compatible
 

--------------------------------------------------compatible

--------------------------------------------------
--------------------------------------------------
cpu_adam ...............cpu_adam cpu_adam [92m[YES][0mcpu_adam...............   ...... [92m[YES][0m............... ...............   [92m[OKAY][0m......[92m[YES][0m
fused_adam ............. [93m[NO][0m .......fused_adam fused_adam [92m[OKAY][0m .............
[92m[YES][0m   [92m[OKAY][0m............
  [92m[OKAY][0m[92m[OKAY][0m

............. fused_adam [93m[NO][0m fused_lamb[93m[NO][0m.............    ...........................  [92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m
 
fused_adam ............. [93m[NO][0m ....... fused_adamfused_adam[92m[OKAY][0m fused_adam
 [93m[NO][0m.......  .......fused_lamb [92m[OKAY][0mfused_lamb  [92m[OKAY][0m.............

.............   ..........................[93m[NO][0m  fused_lamb [93m[NO][0m[93m[NO][0m ....... .............  [92m[OKAY][0m....... .......
  [93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m 

.......fused_lamb  [92m[OKAY][0m.............fused_lamb
.............  [93m[NO][0m[93m[NO][0m fused_lamb ....... ....... .............  [92m[OKAY][0msparse_attn[93m[NO][0m[92m[OKAY][0m 
 
...................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
 fused_lamb [93m[NO][0m .............  ....................  [93m[NO][0m[92m[OKAY][0m [93m[NO][0m
.......sparse_attn   [92m[OKAY][0m...................
  [93m[NO][0m[92m[OKAY][0m 
transformer ............sparse_attnsparse_attn   [93m[NO][0m........................  .......[93m[NO][0m  [93m[NO][0m[92m[OKAY][0m  
....... [92m[OKAY][0msparse_attn
.......sparse_attn.......  stochastic_transformer[92m[OKAY][0m  ............
.[92m[OKAY][0m 
 ............ [93m[NO][0m sparse_attntransformer.......   ............[92m[OKAY][0msparse_attn............
[93m[NO][0mtransformer   .......transformer[93m[NO][0m  ............ ...................  [92m[OKAY][0m [92m[OKAY][0m[93m[NO][0m
[93m[NO][0m
  [93m[NO][0m............ transformer ....... [93m[NO][0m [93m[NO][0m  ............ [92m[OKAY][0m....... 
 [93m[NO][0m.......  [92m[OKAY][0mtransformer[92m[OKAY][0m....... 

  ..............  [92m[OKAY][0m[92m[OKAY][0m

 ............[92m[OKAY][0mtransformer 
 [93m[NO][0mstochastic_transformer............  ....... stochastic_transformer. [93m[NO][0m  [92m[OKAY][0m .
transformer ............stochastic_transformerstochastic_transformer   [93m[NO][0m. . ....... [93m[NO][0m[93m[NO][0m  ..............   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


[93m[NO][0m.......   [93m[NO][0m.......[92m[OKAY][0m stochastic_transformer .......
 [92m[OKAY][0m 
.[92m[OKAY][0mstochastic_transformer
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
  [93m[NO][0m.  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------

DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja--------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------


--------------------------------------------------

DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
JIT compiled ops requires ninja
------------------------------------------------------------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report--------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninja--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
JIT compiled ops requires ninja

--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninjaJIT compiled ops requires ninja--------------------------------------------------


JIT compiled ops requires ninja
------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report--------------------------------------------------
DeepSpeed C++/CUDA extension op report

--------------------------------------------------
----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


--------------------------------------------------
----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja


JIT compiled ops requires ninja
JIT compiled ops requires ninja
--------------------------------------------------

JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0m----------------------------------------------------------------------------------------------------


----------------------------------------------------------------------------------------------------op nameop name

  ................................op nameop name   installedinstalled ................  .................. ..  installedcompatible  compatibleinstalled
.. 
-------------------------------------------------- 
..--------------------------------------------------compatible 

compatible
--------------------------------------------------
--------------------------------------------------cpu_adam
 ............... cpu_adam[92m[YES][0m  .....................cpu_adam   cpu_adam[92m[YES][0m...............[92m[OKAY][0m   
.....................[92m[YES][0m   [92m[OKAY][0m......[92m[YES][0m
  [92m[OKAY][0m......
 fused_adam[92m[OKAY][0m .............
ninjaninjaninjaninja   .................. .................. ....................................  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m

[92m[OKAY][0m
--------------------------------------------------
----------------------------------------------------------------------------------------------------
--------------------------------------------------


op nameop name op nameop name................   ................installed  ................ ................ installed.. installed  installed compatible.. ..
 .. --------------------------------------------------compatible
 [93m[NO][0mfused_adam  ....................  fused_adam[92m[OKAY][0m[93m[NO][0m
 fused_adam ....... .............fused_lamb [92m[OKAY][0m............. 
compatible 

compatible--------------------------------------------------
--------------------------------------------------
--------------------------------------------------

cpu_adam ............... [92m[YES][0m ...... cpu_adam[92m[OKAY][0m 
  [93m[NO][0m............. [93m[NO][0mfused_lamb .......  [93m[NO][0m............. ....... [92m[OKAY][0m ....... 
 [93m[NO][0m[92m[OKAY][0m [92m[OKAY][0m

.......fused_lamb  [92m[OKAY][0m.............
cpu_adamcpu_adam...............   ..............................[92m[YES][0m  [92m[YES][0m[92m[YES][0m  fused_adam............    ...................[92m[OKAY][0m [92m[OKAY][0m 
fused_lamb  [93m[NO][0m.............  .......[93m[NO][0m  [92m[OKAY][0m.......sparse_attn
  [92m[OKAY][0m............sparse_attn
[92m[OKAY][0m

[93m[NO][0m ....... [92m[OKAY][0m
  ............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0msparse_attn
fused_lambfused_adamfused_adam   ....................................... fused_adam [93m[NO][0m  .......[93m[NO][0m  [93m[NO][0m............. [92m[OKAY][0m .......  .......
transformer  ........................sparse_attntransformer    [93m[NO][0m[93m[NO][0m........................    .......[93m[NO][0m....... [93m[NO][0m  ....... [92m[OKAY][0m[92m[OKAY][0m .......

[92m[OKAY][0m 
[92m[OKAY][0m [93m[NO][0m
[92m[OKAY][0m 
....... fused_lamb[92m[OKAY][0m fused_lamb
[92m[OKAY][0mstochastic_transformertransformer
............. sparse_attn fused_lamb............. [93m[NO][0m  ............ .......[93m[NO][0m .............  [93m[NO][0m  .......[93m[NO][0m[92m[OKAY][0m....... 
 [92m[OKAY][0m [92m[OKAY][0m
 stochastic_transformer . transformer............  .[93m[NO][0m   [93m[NO][0m............ [93m[NO][0m....... .......  [93m[NO][0m .......  [92m[OKAY][0m[92m[OKAY][0m.......[92m[OKAY][0m


 [92m[OKAY][0m
.......
 [92m[OKAY][0m
transformer ............ [93m[NO][0msparse_attn  ....... ............[92m[OKAY][0m sparse_attn
stochastic_transformer stochastic_transformer.  .[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
[93m[NO][0m  ................... sparse_attnstochastic_transformer  [93m[NO][0m  [92m[OKAY][0m
. ...................[93m[NO][0mtransformer   [93m[NO][0m [92m[OKAY][0m...................
   [92m[OKAY][0m.......transformer [93m[NO][0m
 [92m[OKAY][0m ............
.......  [93m[NO][0m[92m[OKAY][0m 
.......transformer  [92m[OKAY][0mstochastic_transformer
............ . [93m[NO][0mstochastic_transformer   [93m[NO][0m........   [92m[OKAY][0m[93m[NO][0m
 ....... .......[92m[OKAY][0m 
[92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

--------------------------------------------------
----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------
JIT compiled ops requires ninja


JIT compiled ops requires ninja--------------------------------------------------DeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------

op name op nameop name................op name    ................................installed................    installedinstalled..installed   .. ....compatible   
compatiblecompatible
--------------------------------------------------compatible

----------------------------------------------------------------------------------------------------


--------------------------------------------------
cpu_adam ...............cpu_adam [92m[YES][0m cpu_adamcpu_adam  ............... .....................   [92m[YES][0m[92m[OKAY][0m...............[92m[YES][0m
   ......[92m[YES][0m......   [92m[OKAY][0m......[92m[OKAY][0m
 fused_adam
[92m[OKAY][0m .............
 [93m[NO][0m ....... fused_adam[92m[OKAY][0m fused_adam
 .............fused_lamb............. fused_adam  [93m[NO][0m.............  .............[93m[NO][0m .......   [93m[NO][0m[92m[OKAY][0m[93m[NO][0m....... 
 ....... [92m[OKAY][0m fused_lamb.......
[92m[OKAY][0m  
.............fused_lamb [92m[OKAY][0m .............
[93m[NO][0m  [93m[NO][0m.......fused_lamb .......  [92m[OKAY][0msparse_attn .............
 [92m[OKAY][0m ............
 [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

sparse_attn ............transformer  sparse_attn[93m[NO][0m............   ...................[93m[NO][0m   sparse_attn[92m[OKAY][0m.......[93m[NO][0m
   ...................[92m[OKAY][0m transformer 
 [93m[NO][0m............[92m[OKAY][0m  
[93m[NO][0m.......stochastic_transformer   .......transformer[92m[OKAY][0m. 
  [92m[OKAY][0m[93m[NO][0m............
 transformer .......  [93m[NO][0mstochastic_transformer............[92m[OKAY][0m  
. .......[93m[NO][0m   [93m[NO][0m[92m[OKAY][0m....... 
 .......[92m[OKAY][0m [92m[OKAY][0m
stochastic_transformer
 . stochastic_transformer[93m[NO][0m  ........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------

--------------------------------------------------
JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report
JIT compiled ops requires ninja

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


----------------------------------------------------------------------------------------------------
--------------------------------------------------
--------------------------------------------------
op name
op name op nameop name  ................................  ................installed ................ installed  installed installed....    ....compatiblecompatible
 
-------------------------------------------------- compatible
--------------------------------------------------compatible


----------------------------------------------------------------------------------------------------

cpu_adamcpu_adam ............... cpu_adam  [92m[YES][0mcpu_adam .............................. ......  [92m[YES][0m............... [92m[YES][0m  ...... [92m[OKAY][0m...... 
[92m[YES][0m [92m[OKAY][0m [92m[OKAY][0m
......
 [92m[OKAY][0m
fused_adam ............. [93m[NO][0m .......fused_adamfused_adam   [92m[OKAY][0mfused_adam............. .............
 [93m[NO][0m ............. [93m[NO][0mfused_lamb ....... ....... .............  [93m[NO][0m [92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m

  fused_lamb.......fused_lamb  .......  ..........................[92m[OKAY][0m[92m[OKAY][0m  [93m[NO][0m

 [93m[NO][0m.......  fused_lamb[92m[OKAY][0m....... 
 .............[92m[OKAY][0m 
[93m[NO][0m ....... sparse_attn[92m[OKAY][0m 
............ [93m[NO][0msparse_attn  ...................sparse_attn   [92m[OKAY][0m[93m[NO][0m............
  sparse_attn.......[93m[NO][0mtransformer    [92m[OKAY][0m............................... 
  [93m[NO][0m[93m[NO][0mtransformer[92m[OKAY][0m  
 ..........................   transformer[92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m
  
...................stochastic_transformer  [92m[OKAY][0mtransformer [93m[NO][0m.
   [93m[NO][0m...................stochastic_transformer    .......[93m[NO][0m .[92m[OKAY][0m  [93m[NO][0m[92m[OKAY][0m
....... 
 .......[92m[OKAY][0mstochastic_transformer 
 [92m[OKAY][0m.
 stochastic_transformer[93m[NO][0m  ........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------


--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------JIT compiled ops requires ninja
--------------------------------------------------
--------------------------------------------------

JIT compiled ops requires ninja
JIT compiled ops requires ninja
JIT compiled ops requires ninja

ninjaninjaninjaninja    ...................................................... .................. [92m[OKAY][0m  [92m[OKAY][0m
[92m[OKAY][0m
[92m[OKAY][0m
----------------------------------------------------------------------------------------------------


----------------------------------------------------------------------------------------------------
op nameop name 
 op name................................op name    installedinstalled................ ................  ..installed ..   installedcompatible..compatible
 
 --------------------------------------------------..--------------------------------------------------
compatible 

compatible
--------------------------------------------------
--------------------------------------------------
cpu_adam cpu_adam...............  ...............[92m[YES][0m  [92m[YES][0mcpu_adam......cpu_adam    ......[92m[OKAY][0m.............................. 
  [92m[YES][0m[92m[YES][0m[92m[OKAY][0m  
............  [92m[OKAY][0m[92m[OKAY][0m

fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adamfused_adamfused_adam fused_lamb  ............. .......................................   [93m[NO][0m [93m[NO][0m[93m[NO][0m [93m[NO][0m .......   ..............[92m[OKAY][0m.......   [92m[OKAY][0m
[92m[OKAY][0m[92m[OKAY][0m


fused_lamb fused_lambfused_lamb.............   ..........................[93m[NO][0m   sparse_attn[93m[NO][0m[93m[NO][0m.......    ...................[92m[OKAY][0m.......  
[93m[NO][0m [92m[OKAY][0m [92m[OKAY][0m.......
 
[92m[OKAY][0m
transformer ............ sparse_attn[93m[NO][0m  ...................sparse_attn   [92m[OKAY][0m[93m[NO][0msparse_attn............ 
  .......[93m[NO][0m stochastic_transformer............  [92m[OKAY][0m ........
[93m[NO][0m   [93m[NO][0m....... transformer.......[92m[OKAY][0m 
  ............[92m[OKAY][0mtransformer[92m[OKAY][0m
 
 [93m[NO][0m............  ....... [92m[OKAY][0mtransformer
[93m[NO][0m  ................... stochastic_transformer [93m[NO][0m [92m[OKAY][0m .
....... [93m[NO][0m  stochastic_transformer.......[92m[OKAY][0m  
[92m[OKAY][0m.
ninjaninjaninjaninja   .................................... .................. ..................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m

--------------------------------------------------
 [93m[NO][0m ....... [92m[OKAY][0mstochastic_transformer
 . [93m[NO][0m ....... [92m[OKAY][0m
------------------------------------------------------------------------------------------------------------------------------------------------------


op nameop nameop name op name  ................ ................................................    installedinstalledinstalledinstalled    ...... ..  compatiblecompatible 

compatiblecompatible----------------------------------------------------------------------------------------------------


----------------------------------------------------------------------------------------------------

cpu_adam ...............cpu_adam cpu_adam [92m[YES][0mcpu_adam ...............   ..............................[92m[YES][0m......    [92m[YES][0m[92m[OKAY][0m[92m[YES][0m ......  
............[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m

fused_adam ............. [93m[NO][0m fused_adam.......fused_adam  [92m[OKAY][0m fused_adam.............  
.............[93m[NO][0m.............   .......fused_lamb[93m[NO][0m  [93m[NO][0m [92m[OKAY][0m ........................... 
  [92m[OKAY][0m[93m[NO][0m[92m[OKAY][0mfused_lamb 

 ....................  [92m[OKAY][0mfused_lambfused_lamb[93m[NO][0m
   .................................  [92m[OKAY][0m [93m[NO][0m
[93m[NO][0m  .............. sparse_attn[92m[OKAY][0m  
[92m[OKAY][0m............
 [93m[NO][0m ....... sparse_attn[92m[OKAY][0m 
............ [93m[NO][0mtransformer  ...................sparse_attn [92m[OKAY][0m sparse_attn[93m[NO][0m 
 ............ ....... ............transformer[93m[NO][0m    ............[93m[NO][0m[92m[OKAY][0m [93m[NO][0m....... 
 .......  .......[92m[OKAY][0mstochastic_transformer[92m[OKAY][0m 

 [92m[OKAY][0m.
 stochastic_transformer[93m[NO][0mtransformer transformer  . ...................   ............[93m[NO][0m[93m[NO][0m[92m[OKAY][0m  ....... 
 [93m[NO][0m.......[92m[OKAY][0m  
.......[92m[OKAY][0m 
[92m[OKAY][0mstochastic_transformer
 . [93m[NO][0mstochastic_transformer  ....... .[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report
--------------------------------------------------

DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja--------------------------------------------------
----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninja
DeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------op name
op name
  op name................op name................    ................installed................installed    ..installedinstalled..    ....compatible compatible 

compatiblecompatible----------------------------------------------------------------------------------------------------


----------------------------------------------------------------------------------------------------

cpu_adamcpu_adam cpu_adamcpu_adam...............  ............... ...............   [92m[YES][0m[92m[YES][0m...............[92m[YES][0m    ......[92m[YES][0m............    [92m[OKAY][0m......[92m[OKAY][0m[92m[OKAY][0m

 
[92m[OKAY][0m
fused_adamfused_adam fused_adam ............. ............. fused_adam............. [93m[NO][0m  [93m[NO][0m[93m[NO][0m .............  ....... ..............  [93m[NO][0m [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m 


....... fused_lamb[92m[OKAY][0mfused_lamb fused_lamb 
............. ............. ............. [93m[NO][0mfused_lamb  [93m[NO][0m .......[93m[NO][0m  ............. .......[92m[OKAY][0m .......
[93m[NO][0m   [92m[OKAY][0m[92m[OKAY][0m.......

 [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0msparse_attn
 sparse_attn............sparse_attn  transformer[93m[NO][0m ............ ...................    [93m[NO][0m [93m[NO][0m[92m[OKAY][0m............  .......
 .......[93m[NO][0m[92m[OKAY][0m transformer 
[92m[OKAY][0m ................... 
[92m[OKAY][0mtransformer 
 transformer[93m[NO][0m............   stochastic_transformer.......[93m[NO][0m............    [92m[OKAY][0m........[93m[NO][0m 
  [92m[OKAY][0m[93m[NO][0m....... 
 .......stochastic_transformer[92m[OKAY][0mstochastic_transformer  
 [92m[OKAY][0m..
 stochastic_transformer [93m[NO][0m [93m[NO][0m . ..............   [93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m 

....... [92m[OKAY][0m
ninjaninjaninjaninja   .................. .................. ....................................  [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m 


[92m[OKAY][0m----------------------------------------------------------------------------------------------------


----------------------------------------------------------------------------------------------------op name
op name 
 op name................................op name   installed installed................   ..................installed ..installed    compatible....compatible
  
--------------------------------------------------compatiblecompatible--------------------------------------------------


----------------------------------------------------------------------------------------------------

cpu_adamcpu_adam  ..............................  [92m[YES][0m[92m[YES][0mcpu_adam cpu_adam ............   ............... [92m[OKAY][0m [92m[OKAY][0m
...............[92m[YES][0m
  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
fused_adam fused_adam.............  [93m[NO][0m.............fused_adam   fused_adam.......[93m[NO][0m.............   ....................  [92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m 
[93m[NO][0m
  ..............  fused_lamb[92m[OKAY][0mfused_lamb[92m[OKAY][0m  

..........................  [93m[NO][0mfused_lamb  fused_lamb[93m[NO][0m....................   ............. [93m[NO][0m....... [92m[OKAY][0m [93m[NO][0m .......[92m[OKAY][0m 
 
[92m[OKAY][0m.......
 [92m[OKAY][0m
sparse_attn ............sparse_attnsparse_attn sparse_attn[93m[NO][0m   ............ ...............................    [92m[OKAY][0m[93m[NO][0m[93m[NO][0m[93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
 .......  .......  transformer[92m[OKAY][0m[92m[OKAY][0m.......
 
 ............[92m[OKAY][0mtransformer 
transformer [93m[NO][0m  ........................transformer.......    [93m[NO][0m[92m[OKAY][0m............[93m[NO][0m 
  .......[93m[NO][0m ....... [92m[OKAY][0mstochastic_transformer ....... 
 [92m[OKAY][0m.[92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
 
[93m[NO][0mstochastic_transformer stochastic_transformer ....... .stochastic_transformer .   [92m[OKAY][0m[93m[NO][0m.
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
 [93m[NO][0m [93m[NO][0m.......   ..............[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m

utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m async_io.......  [93m[NO][0m...............
 [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... transformer_inference[92m[OKAY][0m 
.. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ......utils  [92m[OKAY][0m..................
 [92m[YES][0m quantizer......  ..............[92m[OKAY][0m 
[93m[NO][0m ....... quantizer[92m[OKAY][0m 
.............. [93m[NO][0m ....... --------------------------------------------------[92m[OKAY][0m

--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m ....... async_io[93m[NO][0m
 ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m .......utils  [92m[OKAY][0m..................
 [92m[YES][0m ...... [92m[OKAY][0m
utils ..................quantizer  [92m[YES][0m..............  ......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
quantizer .............. [93m[NO][0m-------------------------------------------------- 
....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ...............async_io  [93m[NO][0m...............  .......[93m[NO][0m  [93m[NO][0m.......
 [93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0mutils  .........................  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
utils quantizer..................  ..............[92m[YES][0m  [93m[NO][0m......  .......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer .............. --------------------------------------------------[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. transformer_inference[93m[NO][0m  .........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
utilsutils  ....................................  [92m[YES][0m ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer .............. [93m[NO][0m quantizer.......  ..............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m--------------------------------------------------

--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
----------------------------------------------------------------------------------------------------

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ............... 
[93m[NO][0m ....... [93m[NO][0m
transformer_inferenceasync_io ..  ...............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[93m[NO][0m
utils .................. [92m[YES][0m transformer_inference......  ..[92m[OKAY][0m 
[93m[NO][0m ....... quantizer[92m[OKAY][0m 
.............. [93m[NO][0m ....... utils[92m[OKAY][0m 
.................. [92m[YES][0m --------------------------------------------------......
 [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  .......[92m[OKAY][0m 
[92m[OKAY][0m
----------------------------------------------------------------------------------------------------

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  .................................... [92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_iotransformer_inference  .................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[92m[OKAY][0m

utils .................. [92m[YES][0m ......transformer_inference  [92m[OKAY][0m..
 [93m[NO][0m .......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m ....... [92m[OKAY][0m
utils ..................-------------------------------------------------- 
[92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
DeepSpeed general environment info:torch versionDeepSpeed general environment info: 
.................... 
1.8.1
torch install path torch cuda version...............  ...............torch install path  11.1...............
 nvcc version['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 
..................... 11.2torch version
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] deepspeed install path
....................  ...........torch version1.8.1  
....................['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] torch cuda version
1.8.1 deepspeed info
...............  ...................11.1torch cuda version 
 0.4.2+72ce55a, 72ce55a, big-sciencenvcc version...............
  .....................deepspeed wheel compiled w.11.1  
......11.2nvcc version 
 torch 1.8, cuda 11.1deepspeed install path.....................
  ...........11.2 
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed install path
 ...........deepspeed info  DeepSpeed general environment info:...................['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 

0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed info
 deepspeed wheel compiled w....................  ......0.4.2+72ce55a, 72ce55a, big-sciencetorch install path 
torch 1.8, cuda 11.1 deepspeed wheel compiled w.
...............  ...... torch 1.8, cuda 11.1
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install pathDeepSpeed general environment info: ...............
 torch install path ...............['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 
torch version .................... 1.8.1['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch cuda version torch version...............  ....................11.1 
1.8.1nvcc version
 .....................torch cuda version  11.2...............
 deepspeed install path11.1 
...........nvcc version  .....................['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
11.2
deepspeed info deepspeed install path...................  ...........0.4.2+72ce55a, 72ce55a, big-science 
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed wheel compiled w.
 deepspeed info......  ...................torch 1.8, cuda 11.1 
0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2DeepSpeed general environment info:
deepspeed install path ...........
 ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed infotorch install path ...................  ...............0.4.2+72ce55a, 72ce55a, big-science 
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
torch versiontorch version  ........................................  1.8.11.8.1

deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w. ......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path
 ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version ....................['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 
1.8.1
torch versiontorch cuda version  ...................................  1.8.111.1

nvcc versiontorch cuda version  ....................................  11.211.1

deepspeed install pathnvcc version ...........  ..................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']11.2

deepspeed infodeepspeed install path  ..............................  0.4.2+72ce55a, 72ce55a, big-science
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed wheel compiled w.
 deepspeed info......  ...................torch 1.8, cuda 11.1 
0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  .......................................... 11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ...................0.4.2+72ce55a, 72ce55a, big-science 
0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path 
............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']1.8.1

torch versiontorch cuda version  ...................................  1.8.111.1

nvcc version torch cuda version.....................  ...............11.2 
11.1deepspeed install path
 nvcc version...........  ..................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']11.2

deepspeed infodeepspeed install path  ..............................  0.4.2+72ce55a, 72ce55a, big-science
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed wheel compiled w.
 deepspeed info......  ...................torch 1.8, cuda 11.1 
0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version ....................torch version  1.8.1....................
 1.8.1
torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ...................0.4.2+72ce55a, 72ce55a, big-science 
0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version torch version....................  ....................1.8.1 
1.8.1
torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ...................0.4.2+72ce55a, 72ce55a, big-science 
0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version DeepSpeed general environment info:............... 11.1

nvcc version ..................... 11.2
deepspeed install pathtorch install path  ..........................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w.['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 
...... torch 1.8, cuda 11.1torch version
 .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path torch install path...............  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version torch version....................  ....................1.8.1 
1.8.1
torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 .....................nvcc version  11.2.....................
 11.2deepspeed install path
 ...........deepspeed install path  ...........['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ...................0.4.2+72ce55a, 72ce55a, big-science 
0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version DeepSpeed general environment info:.................... 1.8.1

torch cuda version ............... 11.1
torch install pathnvcc version  ....................................  11.2
deepspeed install path ........... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed infotorch version  .......................................  1.8.10.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w. torch cuda version......  ...............torch 1.8, cuda 11.1 
11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
DeepSpeed general environment info:nvcc version .....................
 11.2
deepspeed install path ...........torch install path  ...............['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']deepspeed wheel compiled w.
 ...... torch versiontorch 1.8, cuda 11.1 
.................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ...................0.4.2+72ce55a, 72ce55a, big-science 
0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path DeepSpeed general environment info:............... 
torch install path['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 
............... torch version .................... 1.8.1
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']torch cuda version
 ............... torch version11.1 
.................... nvcc version1.8.1 
..................... 11.2torch cuda version
 deepspeed install path...............  ...........11.1 
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']nvcc version
 .....................deepspeed info  11.2...................
 deepspeed install path0.4.2+72ce55a, 72ce55a, big-science 
........... deepspeed wheel compiled w. ......['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
torch 1.8, cuda 11.1deepspeed info
 ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 
.................... 1.8.1torch version
 ....................torch cuda version  1.8.1...............
 11.1torch cuda version
 nvcc version...............  .....................11.1 
11.2nvcc version
 .....................deepspeed install path  11.2...........
 deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']...........
 deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
...................deepspeed info  0.4.2+72ce55a, 72ce55a, big-science...................
 deepspeed wheel compiled w.0.4.2+72ce55a, 72ce55a, big-science 
......deepspeed wheel compiled w.  torch 1.8, cuda 11.1......
 torch 1.8, cuda 11.1
using world size: 64, data-parallel-size: 4, tensor-model-parallel size: 4, pipeline-model-parallel size: 4 
using torch.float16 for parameters ...
------------------------ arguments ------------------------
  accumulate_allreduce_grads_in_fp32 .............. False
  adam_beta1 ...................................... 0.9
  adam_beta2 ...................................... 0.999
  adam_eps ........................................ 1e-08
  adlr_autoresume ................................. False
  adlr_autoresume_interval ........................ 1000
  apply_query_key_layer_scaling ................... True
  apply_residual_connection_post_layernorm ........ False
  attention_dropout ............................... 0.1
  attention_softmax_in_fp32 ....................... False
  bert_binary_head ................................ True
  bert_load ....................................... None
  bf16 ............................................ False
  bias_dropout_fusion ............................. True
  bias_gelu_fusion ................................ True
  biencoder_projection_dim ........................ 0
  biencoder_shared_query_context_model ............ False
  block_data_path ................................. None
  checkpoint_activations .......................... True
  checkpoint_in_cpu ............................... False
  checkpoint_num_layers ........................... 1
  clip_grad ....................................... 1.0
  codecarbon_dir .................................. None
  consumed_train_samples .......................... 0
  consumed_valid_samples .......................... 0
  contigious_checkpointing ........................ False
  cpu_optimizer ................................... False
  cpu_torch_adam .................................. False
  data_impl ....................................... mmap
  data_parallel_size .............................. 4
  data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document']
  dataloader_type ................................. single
  DDP_impl ........................................ local
  decoder_seq_length .............................. None
  deepscale ....................................... False
  deepscale_config ................................ None
  deepspeed ....................................... True
  deepspeed_activation_checkpointing .............. True
  deepspeed_config ................................ ./ds_config.1274190.json
  deepspeed_mpi ................................... False
  distribute_checkpointed_activations ............. False
  distributed_backend ............................. nccl
  embedding_path .................................. None
  encoder_seq_length .............................. 2048
  eod_mask_loss ................................... False
  eval_interval ................................... 1000
  eval_iters ...................................... 100
  evidence_data_path .............................. None
  exit_duration_in_mins ........................... 1190
  exit_interval ................................... None
  ffn_hidden_size ................................. 8192
  finetune ........................................ False
  fp16 ............................................ True
  fp16_lm_cross_entropy ........................... False
  fp32_residual_connection ........................ False
  gigaflos_no_embeds .............................. 0
  global_batch_size ............................... 512
  glu_activation .................................. None
  hidden_dropout .................................. 0.1
  hidden_size ..................................... 2048
  hysteresis ...................................... 2
  ict_head_size ................................... None
  ict_load ........................................ None
  img_dim ......................................... 224
  indexer_batch_size .............................. 128
  indexer_log_interval ............................ 1000
  init_method_std ................................. 0.02
  init_method_xavier_uniform ...................... False
  initial_loss_scale .............................. 4294967296
  kv_channels ..................................... 128
  layernorm_epsilon ............................... 1e-05
  lazy_mpu_init ................................... None
  load ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
  local_rank ...................................... 0
  log_batch_size_to_tensorboard ................... True
  log_interval .................................... 200
  log_learning_rate_to_tensorboard ................ True
  log_loss_scale_to_tensorboard ................... True
  log_num_zeros_in_grad ........................... False
  log_params_norm ................................. False
  log_timers_to_tensorboard ....................... True
  log_validation_ppl_to_tensorboard ............... True
  loss_on_targets_only ............................ False
  loss_scale ...................................... 12.0
  loss_scale_window ............................... 1000
  lr .............................................. 0.0002
  lr_decay_iters .................................. None
  lr_decay_samples ................................ 73242187
  lr_decay_style .................................. cosine
  lr_warmup_fraction .............................. None
  lr_warmup_iters ................................. 0
  lr_warmup_samples ............................... 183105
  make_vocab_size_divisible_by .................... 128
  mask_prob ....................................... 0.15
  masked_softmax_fusion ........................... True
  max_position_embeddings ......................... None
  memory_centric_tiled_linear ..................... False
  merge_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-merges.txt
  micro_batch_size ................................ 8
  min_loss_scale .................................. 1.0
  min_lr .......................................... 1e-05
  mmap_warmup ..................................... False
  no_load_optim ................................... None
  no_load_rng ..................................... None
  no_save_optim ................................... None
  no_save_rng ..................................... None
  num_attention_heads ............................. 16
  num_channels .................................... 3
  num_classes ..................................... 1000
  num_layers ...................................... 24
  num_layers_per_virtual_pipeline_stage ........... None
  num_workers ..................................... 2
  onnx_safe ....................................... None
  openai_gelu ..................................... False
  optimizer ....................................... adam
  override_lr_scheduler ........................... False
  params_dtype .................................... torch.float16
  partition_activations ........................... False
  patch_dim ....................................... 16
  pipeline_model_parallel_size .................... 4
  position_embedding_type ......................... PositionEmbeddingType.rotary
  profile_backward ................................ False
  query_in_block_prob ............................. 0.1
  rampup_batch_size ............................... ['32', '32', '2_000_000']
  rank ............................................ 0
  remote_device ................................... none
  reset_attention_mask ............................ False
  reset_position_ids .............................. False
  retriever_report_topk_accuracies ................ []
  retriever_score_scaling ......................... False
  retriever_seq_length ............................ 256
  sample_rate ..................................... 1.0
  save ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
  save_interval ................................... 1500
  scatter_gather_tensors_in_pipeline .............. True
  scattered_embeddings ............................ False
  seed ............................................ 1234
  seq_length ...................................... 2048
  sgd_momentum .................................... 0.9
  short_seq_prob .................................. 0.1
  split ........................................... 949,50,1
  split_transformers .............................. False
  synchronize_each_layer .......................... False
  tensor_model_parallel_size ...................... 4
  tensorboard_dir ................................. /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/tr4c-1B3-rotary-oscar-logs/tensorboard
  tensorboard_log_interval ........................ 1
  tensorboard_queue_size .......................... 5
  tile_factor ..................................... 1
  titles_data_path ................................ None
  tokenizer_name_or_path .......................... None
  tokenizer_type .................................. GPT2BPETokenizer
  train_iters ..................................... None
  train_samples ................................... 73242187
  use_checkpoint_lr_scheduler ..................... False
  use_contiguous_buffers_in_ddp ................... False
  use_cpu_initialization .......................... None
  use_one_sent_docs ............................... False
  use_pin_memory .................................. False
  virtual_pipeline_model_parallel_size ............ None
  vocab_extra_ids ................................. 0
  vocab_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-vocab.json
  weight_decay .................................... 0.1
  world_size ...................................... 64
  zero_allgather_bucket_size ...................... 0.0
  zero_contigious_gradients ....................... False
  zero_reduce_bucket_size ......................... 0.0
  zero_reduce_scatter ............................. False
  zero_stage ...................................... 1
-------------------- end of arguments ---------------------
will use batch size rampup starting from global batch size 32 to global batch size 512 with batch size increments 32 over 2000000 samples.
> building GPT2BPETokenizer tokenizer ...
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch install path torch version...............  .................... 1.8.1
torch cuda version ...............['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 
11.1
nvcc versiontorch version  .........................................  11.21.8.1

deepspeed install path ...........torch cuda version  ............... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']11.1

deepspeed infonvcc version  ........................................  0.4.2+72ce55a, 72ce55a, big-science11.2

deepspeed wheel compiled w.deepspeed install path  .................  torch 1.8, cuda 11.1
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
 > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688)
> initializing torch distributed ...
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

> setting tensorboard ...
> initializing tensor model parallel with size 4
> initializing pipeline model parallel with size 4
> setting random seeds to 1234 ...
[2021-09-27 18:58:25,222] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
> compiling dataset index builder ...
make: Entering directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data'
make: Nothing to be done for 'default'.
make: Leaving directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data'
>>> done with dataset index builder. Compilation time: 0.306 seconds
> compiling and loading fused kernels ...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_upper_triang_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_upper_triang_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module fused_mix_prec_layer_norm_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_mix_prec_layer_norm_cuda...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
>>> done with compiling and loading fused kernels. Compilation time: 19.290 seconds
time to initialize megatron (seconds): 68.865
[after megatron is initialized] datetime: 2021-09-27 18:58:44 
building GPT model ...
[2021-09-27 18:58:44,982] [INFO] [utils.py:680:see_memory_usage] Before Building Model
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved
  warnings.warn(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved
  warnings.warn(
[2021-09-27 18:58:44,985] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB         Max_MA 0.0 GB         CA 0.0 GB         Max_CA 0 GB 
[2021-09-27 18:58:44,985] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory:  used = 38.27 GB, percent = 20.4%
SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None
Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=1, data=0, model=0): 16, ProcessCoord(pipe=1, data=0, model=1): 17, ProcessCoord(pipe=1, data=0, model=2): 18, ProcessCoord(pipe=1, data=0, model=3): 19, ProcessCoord(pipe=1, data=1, model=0): 20, ProcessCoord(pipe=1, data=1, model=1): 21, ProcessCoord(pipe=1, data=1, model=2): 22, ProcessCoord(pipe=1, data=1, model=3): 23, ProcessCoord(pipe=1, data=2, model=0): 24, ProcessCoord(pipe=1, data=2, model=1): 25, ProcessCoord(pipe=1, data=2, model=2): 26, ProcessCoord(pipe=1, data=2, model=3): 27, ProcessCoord(pipe=1, data=3, model=0): 28, ProcessCoord(pipe=1, data=3, model=1): 29, ProcessCoord(pipe=1, data=3, model=2): 30, ProcessCoord(pipe=1, data=3, model=3): 31, ProcessCoord(pipe=2, data=0, model=0): 32, ProcessCoord(pipe=2, data=0, model=1): 33, ProcessCoord(pipe=2, data=0, model=2): 34, ProcessCoord(pipe=2, data=0, model=3): 35, ProcessCoord(pipe=2, data=1, model=0): 36, ProcessCoord(pipe=2, data=1, model=1): 37, ProcessCoord(pipe=2, data=1, model=2): 38, ProcessCoord(pipe=2, data=1, model=3): 39, ProcessCoord(pipe=2, data=2, model=0): 40, ProcessCoord(pipe=2, data=2, model=1): 41, ProcessCoord(pipe=2, data=2, model=2): 42, ProcessCoord(pipe=2, data=2, model=3): 43, ProcessCoord(pipe=2, data=3, model=0): 44, ProcessCoord(pipe=2, data=3, model=1): 45, ProcessCoord(pipe=2, data=3, model=2): 46, ProcessCoord(pipe=2, data=3, model=3): 47, ProcessCoord(pipe=3, data=0, model=0): 48, ProcessCoord(pipe=3, data=0, model=1): 49, ProcessCoord(pipe=3, data=0, model=2): 50, ProcessCoord(pipe=3, data=0, model=3): 51, ProcessCoord(pipe=3, data=1, model=0): 52, ProcessCoord(pipe=3, data=1, model=1): 53, ProcessCoord(pipe=3, data=1, model=2): 54, ProcessCoord(pipe=3, data=1, model=3): 55, ProcessCoord(pipe=3, data=2, model=0): 56, ProcessCoord(pipe=3, data=2, model=1): 57, ProcessCoord(pipe=3, data=2, model=2): 58, ProcessCoord(pipe=3, data=2, model=3): 59, ProcessCoord(pipe=3, data=3, model=0): 60, ProcessCoord(pipe=3, data=3, model=1): 61, ProcessCoord(pipe=3, data=3, model=2): 62, ProcessCoord(pipe=3, data=3, model=3): 63}
[2021-09-27 18:58:45,517] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer
stage=0 layers=9
     0: _to_float16
     1: EmbeddingPipe
     2: <lambda>
     3: ParallelTransformerLayerPipe
     4: ParallelTransformerLayerPipe
     5: ParallelTransformerLayerPipe
     6: ParallelTransformerLayerPipe
     7: ParallelTransformerLayerPipe
     8: ParallelTransformerLayerPipe
stage=1 layers=6
     9: ParallelTransformerLayerPipe
    10: ParallelTransformerLayerPipe
    11: ParallelTransformerLayerPipe
    12: ParallelTransformerLayerPipe
    13: ParallelTransformerLayerPipe
    14: ParallelTransformerLayerPipe
stage=2 layers=6
    15: ParallelTransformerLayerPipe
    16: ParallelTransformerLayerPipe
    17: ParallelTransformerLayerPipe
    18: ParallelTransformerLayerPipe
    19: ParallelTransformerLayerPipe
    20: ParallelTransformerLayerPipe
stage=3 layers=10
    21: ParallelTransformerLayerPipe
    22: ParallelTransformerLayerPipe
    23: ParallelTransformerLayerPipe
    24: ParallelTransformerLayerPipe
    25: ParallelTransformerLayerPipe
    26: ParallelTransformerLayerPipe
    27: <lambda>
    28: MixedFusedLayerNorm
    29: EmbeddingPipe
    30: float16_to_fp32
  loss: CrossEntropy
 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 75592704

 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 101544960 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 101544960

 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 101544960
 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 101549056
 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 101549056
 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 101549056
 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 101549056
[2021-09-27 18:58:45,891] [INFO] [utils.py:680:see_memory_usage] After Building Model
[2021-09-27 18:58:45,891] [INFO] [utils.py:681:see_memory_usage] MA 0.2 GB         Max_MA 0.22 GB         CA 0.24 GB         Max_CA 0 GB 
[2021-09-27 18:58:45,892] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory:  used = 38.66 GB, percent = 20.6%
 > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 101544960
setting training iterations to 152972
> learning rate decay style: cosine
DeepSpeed is enabled.
[2021-09-27 18:58:45,911] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+72ce55a, git-hash=72ce55a, git-branch=big-science
[2021-09-27 18:58:45,978] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False
[2021-09-27 18:58:45,978] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer
[2021-09-27 18:58:45,978] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer
[2021-09-27 18:58:45,978] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam
[2021-09-27 18:58:45,978] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type=<class 'apex.optimizers.fused_adam.FusedAdam'>
[2021-09-27 18:58:45,978] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer
[2021-09-27 18:58:45,978] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000
[2021-09-27 18:58:45,978] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000
[2021-09-27 18:58:45,978] [INFO] [stage2.py:108:__init__] CPU Offload: False
[2021-09-27 18:58:45,978] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False
[2021-09-27 18:58:46,230] [INFO] [stage2.py:419:__init__] optimizer state initialized
[2021-09-27 18:58:46,230] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam
[2021-09-27 18:58:46,230] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler
[2021-09-27 18:58:46,230] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = <megatron.learning_rates.AnnealingLR object at 0x1459d5c5f460>
[2021-09-27 18:58:46,230] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)]
[2021-09-27 18:58:46,230] [INFO] [config.py:900:print] DeepSpeedEngine configuration:
[2021-09-27 18:58:46,230] [INFO] [config.py:904:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2021-09-27 18:58:46,230] [INFO] [config.py:904:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2021-09-27 18:58:46,230] [INFO] [config.py:904:print]   allreduce_always_fp32 ........ False
[2021-09-27 18:58:46,230] [INFO] [config.py:904:print]   amp_enabled .................. False
[2021-09-27 18:58:46,230] [INFO] [config.py:904:print]   amp_params ................... False
[2021-09-27 18:58:46,231] [INFO] [config.py:904:print]   checkpoint_tag_validation_enabled  True
[2021-09-27 18:58:46,231] [INFO] [config.py:904:print]   checkpoint_tag_validation_fail  False
[2021-09-27 18:58:46,231] [INFO] [config.py:904:print]   disable_allgather ............ False
[2021-09-27 18:58:46,231] [INFO] [config.py:904:print]   dump_state ................... False
[2021-09-27 18:58:46,231] [INFO] [config.py:904:print]   dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1}
[2021-09-27 18:58:46,231] [INFO] [config.py:904:print]   eigenvalue_enabled ........... False
[2021-09-27 18:58:46,231] [INFO] [config.py:904:print]   eigenvalue_gas_boundary_resolution  1
[2021-09-27 18:58:46,231] [INFO] [config.py:904:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2021-09-27 18:58:46,231] [INFO] [config.py:904:print]   eigenvalue_layer_num ......... 0
[2021-09-27 18:58:46,231] [INFO] [config.py:904:print]   eigenvalue_max_iter .......... 100
[2021-09-27 18:58:46,231] [INFO] [config.py:904:print]   eigenvalue_stability ......... 1e-06
[2021-09-27 18:58:46,231] [INFO] [config.py:904:print]   eigenvalue_tol ............... 0.01
[2021-09-27 18:58:46,231] [INFO] [config.py:904:print]   eigenvalue_verbose ........... False
[2021-09-27 18:58:46,231] [INFO] [config.py:904:print]   elasticity_enabled ........... False
[2021-09-27 18:58:46,231] [INFO] [config.py:904:print]   flops_profiler_config ........ {
    "enabled": false, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2021-09-27 18:58:46,231] [INFO] [config.py:904:print]   fp16_enabled ................. True
[2021-09-27 18:58:46,231] [INFO] [config.py:904:print]   fp16_mixed_quantize .......... False
[2021-09-27 18:58:46,231] [INFO] [config.py:904:print]   global_rank .................. 0
[2021-09-27 18:58:46,231] [INFO] [config.py:904:print]   gradient_accumulation_steps .. 16
[2021-09-27 18:58:46,231] [INFO] [config.py:904:print]   gradient_clipping ............ 1.0
[2021-09-27 18:58:46,231] [INFO] [config.py:904:print]   gradient_predivide_factor .... 1.0
[2021-09-27 18:58:46,231] [INFO] [config.py:904:print]   initial_dynamic_scale ........ 4096
[2021-09-27 18:58:46,231] [INFO] [config.py:904:print]   loss_scale ................... 0
[2021-09-27 18:58:46,231] [INFO] [config.py:904:print]   memory_breakdown ............. False
[2021-09-27 18:58:46,231] [INFO] [config.py:904:print]   optimizer_legacy_fusion ...... False
[2021-09-27 18:58:46,231] [INFO] [config.py:904:print]   optimizer_name ............... None
[2021-09-27 18:58:46,231] [INFO] [config.py:904:print]   optimizer_params ............. None
[2021-09-27 18:58:46,231] [INFO] [config.py:904:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2021-09-27 18:58:46,231] [INFO] [config.py:904:print]   pld_enabled .................. False
[2021-09-27 18:58:46,231] [INFO] [config.py:904:print]   pld_params ................... False
[2021-09-27 18:58:46,231] [INFO] [config.py:904:print]   prescale_gradients ........... False
[2021-09-27 18:58:46,231] [INFO] [config.py:904:print]   quantize_change_rate ......... 0.001
[2021-09-27 18:58:46,231] [INFO] [config.py:904:print]   quantize_groups .............. 1
[2021-09-27 18:58:46,231] [INFO] [config.py:904:print]   quantize_offset .............. 1000
[2021-09-27 18:58:46,231] [INFO] [config.py:904:print]   quantize_period .............. 1000
[2021-09-27 18:58:46,231] [INFO] [config.py:904:print]   quantize_rounding ............ 0
[2021-09-27 18:58:46,232] [INFO] [config.py:904:print]   quantize_start_bits .......... 16
[2021-09-27 18:58:46,232] [INFO] [config.py:904:print]   quantize_target_bits ......... 8
[2021-09-27 18:58:46,232] [INFO] [config.py:904:print]   quantize_training_enabled .... False
[2021-09-27 18:58:46,232] [INFO] [config.py:904:print]   quantize_type ................ 0
[2021-09-27 18:58:46,232] [INFO] [config.py:904:print]   quantize_verbose ............. False
[2021-09-27 18:58:46,232] [INFO] [config.py:904:print]   scheduler_name ............... None
[2021-09-27 18:58:46,232] [INFO] [config.py:904:print]   scheduler_params ............. None
[2021-09-27 18:58:46,232] [INFO] [config.py:904:print]   sparse_attention ............. None
[2021-09-27 18:58:46,232] [INFO] [config.py:904:print]   sparse_gradients_enabled ..... False
[2021-09-27 18:58:46,232] [INFO] [config.py:904:print]   steps_per_print .............. 2000
[2021-09-27 18:58:46,232] [INFO] [config.py:904:print]   tensorboard_enabled .......... False
[2021-09-27 18:58:46,232] [INFO] [config.py:904:print]   tensorboard_job_name ......... DeepSpeedJobName
[2021-09-27 18:58:46,232] [INFO] [config.py:904:print]   tensorboard_output_path ...... 
[2021-09-27 18:58:46,232] [INFO] [config.py:904:print]   train_batch_size ............. 512
[2021-09-27 18:58:46,232] [INFO] [config.py:904:print]   train_micro_batch_size_per_gpu  8
[2021-09-27 18:58:46,232] [INFO] [config.py:904:print]   use_quantizer_kernel ......... False
[2021-09-27 18:58:46,232] [INFO] [config.py:904:print]   wall_clock_breakdown ......... False
[2021-09-27 18:58:46,232] [INFO] [config.py:904:print]   world_size ................... 4
[2021-09-27 18:58:46,232] [INFO] [config.py:904:print]   zero_allow_untested_optimizer  False
[2021-09-27 18:58:46,232] [INFO] [config.py:904:print]   zero_config .................. {
    "stage": 1, 
    "contiguous_gradients": false, 
    "reduce_scatter": true, 
    "reduce_bucket_size": 5.000000e+08, 
    "allgather_partitions": true, 
    "allgather_bucket_size": 5.000000e+08, 
    "overlap_comm": false, 
    "load_from_fp32_weights": true, 
    "elastic_checkpoint": true, 
    "offload_param": null, 
    "offload_optimizer": null, 
    "sub_group_size": 1.000000e+09, 
    "prefetch_bucket_size": 5.000000e+07, 
    "param_persistence_threshold": 1.000000e+05, 
    "max_live_parameters": 1.000000e+09, 
    "max_reuse_distance": 1.000000e+09, 
    "gather_fp16_weights_on_model_save": false, 
    "ignore_unused_parameters": true, 
    "round_robin_gradients": false, 
    "legacy_stage1": false
}
[2021-09-27 18:58:46,232] [INFO] [config.py:904:print]   zero_enabled ................. True
[2021-09-27 18:58:46,232] [INFO] [config.py:904:print]   zero_optimization_stage ...... 1
[2021-09-27 18:58:46,232] [INFO] [config.py:906:print]   json = {
    "train_micro_batch_size_per_gpu": 8, 
    "train_batch_size": 512, 
    "gradient_clipping": 1.0, 
    "zero_optimization": {
        "stage": 1
    }, 
    "fp16": {
        "enabled": true, 
        "loss_scale": 0, 
        "loss_scale_window": 500, 
        "hysteresis": 2, 
        "min_loss_scale": 1, 
        "initial_scale_power": 12
    }, 
    "steps_per_print": 2.000000e+03, 
    "wall_clock_breakdown": false
}
[2021-09-27 18:58:46,232] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=16 micro_batch_size=8
[2021-09-27 18:58:46,521] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-27 18:58:46,521] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-27 18:58:46,521] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-27 18:58:46,521] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-27 18:58:46,521] [INFO] [engine.py:134:__init__] RANK=32 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-27 18:58:46,521] [INFO] [engine.py:134:__init__] RANK=35 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-27 18:58:46,521] [INFO] [engine.py:134:__init__] RANK=34 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-27 18:58:46,521] [INFO] [engine.py:134:__init__] RANK=16 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-27 18:58:46,521] [INFO] [engine.py:134:__init__] RANK=17 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-27 18:58:46,521] [INFO] [engine.py:134:__init__] RANK=19 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-27 18:58:46,521] [INFO] [engine.py:134:__init__] RANK=33 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-27 18:58:46,521] [INFO] [engine.py:134:__init__] RANK=18 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-27 18:58:46,521] [INFO] [engine.py:134:__init__] RANK=49 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-27 18:58:46,521] [INFO] [engine.py:134:__init__] RANK=51 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-27 18:58:46,521] [INFO] [engine.py:134:__init__] RANK=50 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-27 18:58:46,521] [INFO] [engine.py:134:__init__] RANK=48 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
 > using checkpoint value 0.0002 for learning rate
 > using checkpoint value 1e-05 for minimum learning rate
 > using checkpoint value 183105 for warmup iterations
 > using checkpoint value 73242187 for total number of iterations
 > using checkpoint value cosine for decay style
successfully loaded 4 ZeRO state_dicts for rank 28
successfully loaded 4 ZeRO state_dicts for rank 25
successfully loaded 4 ZeRO state_dicts for rank 44
successfully loaded 4 ZeRO state_dicts for rank 36
successfully loaded 4 ZeRO state_dicts for rank 40
successfully loaded 4 ZeRO state_dicts for rank 24
successfully loaded 4 ZeRO state_dicts for rank 31
successfully loaded 4 ZeRO state_dicts for rank 17
successfully loaded 4 ZeRO state_dicts for rank 32
successfully loaded 4 ZeRO state_dicts for rank 35
successfully loaded 4 ZeRO state_dicts for rank 23
successfully loaded 4 ZeRO state_dicts for rank 27
successfully loaded 4 ZeRO state_dicts for rank 19
successfully loaded 4 ZeRO state_dicts for rank 21
successfully loaded 4 ZeRO state_dicts for rank 16
successfully loaded 4 ZeRO state_dicts for rank 42
successfully loaded 4 ZeRO state_dicts for rank 46
successfully loaded 4 ZeRO state_dicts for rank 43
successfully loaded 4 ZeRO state_dicts for rank 20
successfully loaded 4 ZeRO state_dicts for rank 34
successfully loaded 4 ZeRO state_dicts for rank 39
successfully loaded 4 ZeRO state_dicts for rank 45
successfully loaded 4 ZeRO state_dicts for rank 38
successfully loaded 4 ZeRO state_dicts for rank 26
successfully loaded 4 ZeRO state_dicts for rank 41
successfully loaded 4 ZeRO state_dicts for rank 33
successfully loaded 4 ZeRO state_dicts for rank 30
successfully loaded 4 ZeRO state_dicts for rank 47
successfully loaded 4 ZeRO state_dicts for rank 37
successfully loaded 4 ZeRO state_dicts for rank 22
successfully loaded 4 ZeRO state_dicts for rank 18
successfully loaded 4 ZeRO state_dicts for rank 29
successfully loaded 4 ZeRO state_dicts for rank 10
loading 4 zero partition checkpoints for rank 28
successfully loaded 4 ZeRO state_dicts for rank 56
successfully loaded 4 ZeRO state_dicts for rank 6
successfully loaded 4 ZeRO state_dicts for rank 7
successfully loaded 4 ZeRO state_dicts for rank 59
successfully loaded 4 ZeRO state_dicts for rank 14
successfully loaded 4 ZeRO state_dicts for rank 63
successfully loaded 4 ZeRO state_dicts for rank 48
loading 4 zero partition checkpoints for rank 44
loading 4 zero partition checkpoints for rank 40
loading 4 zero partition checkpoints for rank 25
successfully loaded 4 ZeRO state_dicts for rank 52
successfully loaded 4 ZeRO state_dicts for rank 51
successfully loaded 4 ZeRO state_dicts for rank 2
successfully loaded 4 ZeRO state_dicts for rank 50
loading 4 zero partition checkpoints for rank 36
successfully loaded 4 ZeRO state_dicts for rank 11
successfully loaded 4 ZeRO state_dicts for rank 62
successfully loaded 4 ZeRO state_dicts for rank 12
successfully loaded 4 ZeRO state_dicts for rank 3
successfully loaded 4 ZeRO state_dicts for rank 15
successfully loaded 4 ZeRO state_dicts for rank 58
successfully loaded 4 ZeRO state_dicts for rank 4
successfully loaded 4 ZeRO state_dicts for rank 8
loading 4 zero partition checkpoints for rank 24
loading 4 zero partition checkpoints for rank 27
loading 4 zero partition checkpoints for rank 32
successfully loaded 4 ZeRO state_dicts for rank 60
successfully loaded 4 ZeRO state_dicts for rank 49
successfully loaded 4 ZeRO state_dicts for rank 0
loading 4 zero partition checkpoints for rank 19
successfully loaded 4 ZeRO state_dicts for rank 54
successfully loaded 4 ZeRO state_dicts for rank 61
loading 4 zero partition checkpoints for rank 17
loading 4 zero partition checkpoints for rank 31
loading 4 zero partition checkpoints for rank 20
loading 4 zero partition checkpoints for rank 21
loading 4 zero partition checkpoints for rank 23
loading 4 zero partition checkpoints for rank 46
loading 4 zero partition checkpoints for rank 35
loading 4 zero partition checkpoints for rank 16
successfully loaded 4 ZeRO state_dicts for rank 1
successfully loaded 4 ZeRO state_dicts for rank 5
loading 4 zero partition checkpoints for rank 42
loading 4 zero partition checkpoints for rank 43
successfully loaded 4 ZeRO state_dicts for rank 13
loading 4 zero partition checkpoints for rank 34
loading 4 zero partition checkpoints for rank 39
successfully loaded 4 ZeRO state_dicts for rank 9
loading 4 zero partition checkpoints for rank 45
loading 4 zero partition checkpoints for rank 38
successfully loaded 4 ZeRO state_dicts for rank 53
loading 4 zero partition checkpoints for rank 37
loading 4 zero partition checkpoints for rank 47
loading 4 zero partition checkpoints for rank 33
loading 4 zero partition checkpoints for rank 26
loading 4 zero partition checkpoints for rank 41
successfully loaded 4 ZeRO state_dicts for rank 55
loading 4 zero partition checkpoints for rank 30
loading 4 zero partition checkpoints for rank 22
successfully loaded 4 ZeRO state_dicts for rank 57
loading 4 zero partition checkpoints for rank 29
loading 4 zero partition checkpoints for rank 18
loading 4 zero partition checkpoints for rank 10
loading 4 zero partition checkpoints for rank 56
loading 4 zero partition checkpoints for rank 7
loading 4 zero partition checkpoints for rank 6
loading 4 zero partition checkpoints for rank 59
loading 4 zero partition checkpoints for rank 58
loading 4 zero partition checkpoints for rank 2
loading 4 zero partition checkpoints for rank 14
loading 4 zero partition checkpoints for rank 12loading 4 zero partition checkpoints for rank 15

loading 4 zero partition checkpoints for rank 51loading 4 zero partition checkpoints for rank 48

loading 4 zero partition checkpoints for rank 63
loading 4 zero partition checkpoints for rank 50
loading 4 zero partition checkpoints for rank 60
loading 4 zero partition checkpoints for rank 52
loading 4 zero partition checkpoints for rank 3
loading 4 zero partition checkpoints for rank 11
loading 4 zero partition checkpoints for rank 62
loading 4 zero partition checkpoints for rank 4
loading 4 zero partition checkpoints for rank 8
loading 4 zero partition checkpoints for rank 1
loading 4 zero partition checkpoints for rank 5
loading 4 zero partition checkpoints for rank 49
loading 4 zero partition checkpoints for rank 0
 checkpoint version 3.0
loading 4 zero partition checkpoints for rank 61
loading 4 zero partition checkpoints for rank 9
loading 4 zero partition checkpoints for rank 54
loading 4 zero partition checkpoints for rank 13
loading 4 zero partition checkpoints for rank 53
loading 4 zero partition checkpoints for rank 57
loading 4 zero partition checkpoints for rank 55
  successfully loaded checkpoint from /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints at iteration 3000
time (ms) | load-checkpoint: 2026.42
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.62471936
estimated model parameters: 1.209483264estimated model parameters: 1.209483264

estimated model parameters: 1.209483264
estimated model parameters: 1.62471936
estimated model parameters: 1.209483264
estimated model parameters: 1.62471936estimated model parameters: 1.62471936
estimated model parameters: 1.209483264

estimated model parameters: 1.209483264estimated model parameters: 1.209483264

estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264estimated model parameters: 1.209483264

estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.62471936
estimated model parameters: 1.624784896
estimated model parameters: 1.62471936
estimated model parameters: 1.624784896estimated model parameters: 1.624784896

estimated model parameters: 1.624784896
estimated model parameters: 1.62471936
estimated model parameters: 1.209483264estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.62471936

estimated model parameters: 1.624784896estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.62471936

estimated model parameters: 1.62471936
estimated model parameters: 1.62471936estimated model parameters: 1.62471936

estimated model parameters: 1.624784896
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488

estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264


estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488

estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters: 1.62471936
estimated model parameters: 1.62471936estimated model parameters: 1.62471936

estimated model parameters: 1.62471936
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
[after model, optimizer, and learning rate scheduler are built] datetime: 2021-09-27 18:58:48 
> building train, validation, and test datasets ...
 > datasets target sizes (minimum size):
    train:      73242187
    validation: 7833600
    test:       51200
> building train, validation, and test datasets for GPT ...
 > building dataset index ...
    reading sizes...
    reading pointers...
    reading document index...
    creating numpy buffer of mmap...
    creating memory view of numpy buffer...
 > finished creating indexed dataset in 0.035264 seconds
    number of documents: 304230423
 > dataset split:
    train:
     document indices in [0, 288714672) total of 288714672 documents
    validation:
     document indices in [288714672, 303926193) total of 15211521 documents
    test:
     document indices in [303926193, 304230423) total of 304230 documents
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.325 seconds
    total number of samples: 131537224
    total number of epochs: 1
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.365 seconds
    total number of samples: 13854322
    total number of epochs: 2
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.058 seconds
    total number of samples: 137384
    total number of epochs: 1
> finished creating GPT datasets ...
[after dataloaders are built] datetime: 2021-09-27 18:58:54 
done with setup ...
training ...
time (ms) | model-and-optimizer-setup: 3775.03 | train/valid/test-data-iterators-setup: 5449.32
Number of parameters: 1.62471936 billionNumber of parameters: 1.62471936 billion

Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion


Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion

Number of parameters: 1.624784896 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion

Number of parameters: 1.62471936 billionNumber of parameters: 1.62471936 billion

Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion

Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion

Number of parameters: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billion

Number of parameters: 1.62471936 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion

Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.62471936 billion

Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.62471936 billionNumber of parameters without embeddings: 1.209483264 billion

Number of parameters: 1.62471936 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion

Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billionNumber of parameters without embeddings: 1.2095488 billion

Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion

Number of parameters without embeddings: 1.2095488 billion
Number of parameters: 1.624784896 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion

Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters: 1.624784896 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
[before the start of training step] datetime: 2021-09-27 18:58:54 
[2021-09-27 18:58:54,952] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information
[2021-09-27 18:58:54,953] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False
[2021-09-27 18:58:54,953] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 24 total layers
[2021-09-27 18:58:54,953] [INFO] [checkpointing.py:415:forward] ----Synchronization False
[2021-09-27 18:58:54,953] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False
[Rank 34] (after 3200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2410.7431640625 | reserved: 3868.0 | max reserved: 3868.0
[Rank 18] (after 3200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2410.7431640625 | reserved: 3772.0 | max reserved: 3772.0
[Rank 2] (after 3200 iterations) memory (MB) | allocated: 514.96142578125 | max allocated: 2555.3828125 | reserved: 3958.0 | max reserved: 3958.0
[Rank 50] (after 3200 iterations) memory (MB) | allocated: 1430.91943359375 | max allocated: 3257.17822265625 | reserved: 5008.0 | max reserved: 5008.0
[Rank 51] (after 3200 iterations) memory (MB) | allocated: 1430.91943359375 | max allocated: 3257.17822265625 | reserved: 5420.0 | max reserved: 5420.0
 iteration     3200/  152972 | consumed samples:       102400 | elapsed time per iteration (ms): 1316.9 | learning rate: 1.118E-04 | global batch size:    32 | lm loss: 4.363139E+00 | loss scale: 65536.0 | grad norm: 26420.759 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
[Rank 19] (after 3200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2410.7431640625 | reserved: 3932.0 | max reserved: 3932.0
[Rank 35] (after 3200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2410.7431640625 | reserved: 3708.0 | max reserved: 3708.0
[Rank 3] (after 3200 iterations) memory (MB) | allocated: 514.96142578125 | max allocated: 2555.3828125 | reserved: 4150.0 | max reserved: 4150.0
[Rank 0] (after 3200 iterations) memory (MB) | allocated: 514.96142578125 | max allocated: 2555.3828125 | reserved: 4118.0 | max reserved: 4118.0
[Rank 32] (after 3200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2410.7431640625 | reserved: 3644.0 | max reserved: 3644.0
[Rank 16] (after 3200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2410.7431640625 | reserved: 3724.0 | max reserved: 3724.0
[Rank 48] (after 3200 iterations) memory (MB) | allocated: 1430.91943359375 | max allocated: 3257.17822265625 | reserved: 5880.0 | max reserved: 5880.0
[Rank 17] (after 3200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2410.7431640625 | reserved: 3772.0 | max reserved: 3772.0
[Rank 1] (after 3200 iterations) memory (MB) | allocated: 514.96142578125 | max allocated: 2555.3828125 | reserved: 3958.0 | max reserved: 3958.0
[Rank 33] (after 3200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2410.7431640625 | reserved: 3628.0 | max reserved: 3628.0
[Rank 49] (after 3200 iterations) memory (MB) | allocated: 1430.91943359375 | max allocated: 3257.17822265625 | reserved: 5152.0 | max reserved: 5152.0
time (ms)
 iteration     3400/  152972 | consumed samples:       108800 | elapsed time per iteration (ms): 1252.2 | learning rate: 1.187E-04 | global batch size:    32 | lm loss: 4.345543E+00 | loss scale: 16384.0 | grad norm: 8962.948 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3600/  152972 | consumed samples:       115200 | elapsed time per iteration (ms): 1251.5 | learning rate: 1.257E-04 | global batch size:    32 | lm loss: 4.301370E+00 | loss scale: 16384.0 | grad norm: 14676.152 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     3800/  152972 | consumed samples:       121600 | elapsed time per iteration (ms): 1254.5 | learning rate: 1.326E-04 | global batch size:    32 | lm loss: 4.290591E+00 | loss scale: 16384.0 | grad norm: 7084.546 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-27 19:20:00,826] [INFO] [logging.py:68:log_dist] [Rank 0] step=4000, skipped=5, lr=[0.0001396357281341307, 0.0001396357281341307], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 4000 loss: 4.2480 iter time (s): 0.001 samples/sec: 51211.008
 iteration     4000/  152972 | consumed samples:       128000 | elapsed time per iteration (ms): 1254.6 | learning rate: 1.396E-04 | global batch size:    32 | lm loss: 4.282688E+00 | loss scale: 32768.0 | grad norm: 13175.072 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
------------------------------------------------------------------------------------------------
 validation loss at iteration 4000 | lm loss value: 4.203053E+00 | lm loss PPL: 6.689027E+01 | 
------------------------------------------------------------------------------------------------
 iteration     4200/  152972 | consumed samples:       135456 | elapsed time per iteration (ms): 1450.2 | learning rate: 1.478E-04 | global batch size:    64 | lm loss: 4.278505E+00 | loss scale: 32768.0 | grad norm: 19492.911 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4400/  152972 | consumed samples:       148256 | elapsed time per iteration (ms): 1569.6 | learning rate: 1.618E-04 | global batch size:    64 | lm loss: 4.200408E+00 | loss scale: 65536.0 | grad norm: 17302.104 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration    4500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-09-27 19:32:42,114] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step4500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration    4500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1624.04
 iteration     4600/  152972 | consumed samples:       161056 | elapsed time per iteration (ms): 1577.3 | learning rate: 1.757E-04 | global batch size:    64 | lm loss: 4.158590E+00 | loss scale: 65536.0 | grad norm: 90090.310 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     4800/  152972 | consumed samples:       173856 | elapsed time per iteration (ms): 1566.8 | learning rate: 1.897E-04 | global batch size:    64 | lm loss: 4.134281E+00 | loss scale: 65536.0 | grad norm: 16840.880 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5000/  152972 | consumed samples:       186656 | elapsed time per iteration (ms): 1566.7 | learning rate: 2.000E-04 | global batch size:    64 | lm loss: 4.118727E+00 | loss scale: 65536.0 | grad norm: 23340.721 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
------------------------------------------------------------------------------------------------
 validation loss at iteration 5000 | lm loss value: 4.035117E+00 | lm loss PPL: 5.654952E+01 | 
------------------------------------------------------------------------------------------------
 iteration     5200/  152972 | consumed samples:       199456 | elapsed time per iteration (ms): 1766.6 | learning rate: 2.000E-04 | global batch size:    64 | lm loss: 4.092298E+00 | loss scale: 65536.0 | grad norm: 18294.128 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5400/  152972 | consumed samples:       212256 | elapsed time per iteration (ms): 1569.5 | learning rate: 2.000E-04 | global batch size:    64 | lm loss: 4.052300E+00 | loss scale: 65536.0 | grad norm: 16701.464 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5600/  152972 | consumed samples:       225056 | elapsed time per iteration (ms): 1570.4 | learning rate: 2.000E-04 | global batch size:    64 | lm loss: 4.024185E+00 | loss scale: 131072.0 | grad norm: 50413.636 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     5800/  152972 | consumed samples:       237856 | elapsed time per iteration (ms): 1584.9 | learning rate: 2.000E-04 | global batch size:    64 | lm loss: 4.012891E+00 | loss scale: 131072.0 | grad norm: 198634.731 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-27 20:12:38,845] [INFO] [logging.py:68:log_dist] [Rank 0] step=6000, skipped=9, lr=[0.00019999960413909058, 0.00019999960413909058], mom=[(0.9, 0.999), (0.9, 0.999)]
 iteration     6000/  152972 | consumed samples:       250656 | elapsed time per iteration (ms): 1568.1 | learning rate: 2.000E-04 | global batch size:    64 | lm loss: 4.016407E+00 | loss scale: 65536.0 | grad norm: 20235.260 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
steps: 6000 loss: 3.9158 iter time (s): 0.001 samples/sec: 81758.619
------------------------------------------------------------------------------------------------
 validation loss at iteration 6000 | lm loss value: 3.954991E+00 | lm loss PPL: 5.219524E+01 | 
------------------------------------------------------------------------------------------------
saving checkpoint at iteration    6000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-09-27 20:13:35,546] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step6000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration    6000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1432.83
 iteration     6200/  152972 | consumed samples:       263456 | elapsed time per iteration (ms): 1856.1 | learning rate: 2.000E-04 | global batch size:    64 | lm loss: 3.960543E+00 | loss scale: 65536.0 | grad norm: 14112.780 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6400/  152972 | consumed samples:       281024 | elapsed time per iteration (ms): 1815.3 | learning rate: 2.000E-04 | global batch size:    96 | lm loss: 3.917191E+00 | loss scale: 65536.0 | grad norm: 17575.898 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6600/  152972 | consumed samples:       300224 | elapsed time per iteration (ms): 1876.8 | learning rate: 2.000E-04 | global batch size:    96 | lm loss: 3.867980E+00 | loss scale: 131072.0 | grad norm: 26876.150 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     6800/  152972 | consumed samples:       319424 | elapsed time per iteration (ms): 1896.8 | learning rate: 2.000E-04 | global batch size:    96 | lm loss: 3.842974E+00 | loss scale: 131072.0 | grad norm: 24631.625 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7000/  152972 | consumed samples:       338624 | elapsed time per iteration (ms): 1884.5 | learning rate: 2.000E-04 | global batch size:    96 | lm loss: 3.790887E+00 | loss scale: 262144.0 | grad norm: 52351.032 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
------------------------------------------------------------------------------------------------
 validation loss at iteration 7000 | lm loss value: 3.730043E+00 | lm loss PPL: 4.168089E+01 | 
------------------------------------------------------------------------------------------------
 iteration     7200/  152972 | consumed samples:       357824 | elapsed time per iteration (ms): 2177.1 | learning rate: 2.000E-04 | global batch size:    96 | lm loss: 3.771359E+00 | loss scale: 262144.0 | grad norm: 47723.744 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7400/  152972 | consumed samples:       377024 | elapsed time per iteration (ms): 1883.3 | learning rate: 2.000E-04 | global batch size:    96 | lm loss: 3.776500E+00 | loss scale: 131072.0 | grad norm: 23440.469 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration    7500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-09-27 21:00:25,144] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step7500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration    7500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1568.05
 iteration     7600/  152972 | consumed samples:       396224 | elapsed time per iteration (ms): 1889.5 | learning rate: 2.000E-04 | global batch size:    96 | lm loss: 3.765444E+00 | loss scale: 131072.0 | grad norm: 24113.632 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     7800/  152972 | consumed samples:       420544 | elapsed time per iteration (ms): 2132.5 | learning rate: 2.000E-04 | global batch size:   128 | lm loss: 3.705071E+00 | loss scale: 262144.0 | grad norm: 59311.227 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-27 21:17:59,449] [INFO] [logging.py:68:log_dist] [Rank 0] step=8000, skipped=13, lr=[0.00019999396297621752, 0.00019999396297621752], mom=[(0.9, 0.999), (0.9, 0.999)]
 iteration     8000/  152972 | consumed samples:       446144 | elapsed time per iteration (ms): 2191.2 | learning rate: 2.000E-04 | global batch size:   128 | lm loss: 3.683059E+00 | loss scale: 131072.0 | grad norm: 22629.209 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
steps: 8000 loss: 3.6834 iter time (s): 0.001 samples/sec: 116551.898
------------------------------------------------------------------------------------------------
 validation loss at iteration 8000 | lm loss value: 3.634668E+00 | lm loss PPL: 3.788926E+01 | 
------------------------------------------------------------------------------------------------
 iteration     8200/  152972 | consumed samples:       471744 | elapsed time per iteration (ms): 2475.1 | learning rate: 2.000E-04 | global batch size:   128 | lm loss: 3.726107E+00 | loss scale: 32768.0 | grad norm: 5902.030 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8400/  152972 | consumed samples:       497344 | elapsed time per iteration (ms): 2199.1 | learning rate: 2.000E-04 | global batch size:   128 | lm loss: 3.711342E+00 | loss scale: 32768.0 | grad norm: 6906.407 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8600/  152972 | consumed samples:       522944 | elapsed time per iteration (ms): 2199.5 | learning rate: 2.000E-04 | global batch size:   128 | lm loss: 3.635965E+00 | loss scale: 65536.0 | grad norm: 11140.994 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     8800/  152972 | consumed samples:       552320 | elapsed time per iteration (ms): 2381.0 | learning rate: 2.000E-04 | global batch size:   160 | lm loss: 3.616158E+00 | loss scale: 65536.0 | grad norm: 9384.599 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9000/  152972 | consumed samples:       584320 | elapsed time per iteration (ms): 2510.0 | learning rate: 2.000E-04 | global batch size:   160 | lm loss: 3.582701E+00 | loss scale: 65536.0 | grad norm: 9793.357 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
------------------------------------------------------------------------------------------------
 validation loss at iteration 9000 | lm loss value: 3.545391E+00 | lm loss PPL: 3.465323E+01 | 
------------------------------------------------------------------------------------------------
saving checkpoint at iteration    9000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-09-27 21:58:24,829] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step9000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration    9000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1598.09
 iteration     9200/  152972 | consumed samples:       616320 | elapsed time per iteration (ms): 2879.6 | learning rate: 2.000E-04 | global batch size:   160 | lm loss: 3.569264E+00 | loss scale: 131072.0 | grad norm: 20472.204 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9400/  152972 | consumed samples:       648320 | elapsed time per iteration (ms): 2516.8 | learning rate: 2.000E-04 | global batch size:   160 | lm loss: 3.684855E+00 | loss scale: 32768.0 | grad norm: 45042.135 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9600/  152972 | consumed samples:       683040 | elapsed time per iteration (ms): 2637.2 | learning rate: 2.000E-04 | global batch size:   192 | lm loss: 3.739268E+00 | loss scale: 32768.0 | grad norm: 4405.172 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration     9800/  152972 | consumed samples:       721440 | elapsed time per iteration (ms): 2813.7 | learning rate: 2.000E-04 | global batch size:   192 | lm loss: 3.553106E+00 | loss scale: 32768.0 | grad norm: 4393.145 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-27 22:42:46,273] [INFO] [logging.py:68:log_dist] [Rank 0] step=10000, skipped=18, lr=[0.0001999709489126401, 0.0001999709489126401], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 10000 loss: 3.5069 iter time (s): 0.001 samples/sec: 134411.325
 iteration    10000/  152972 | consumed samples:       759840 | elapsed time per iteration (ms): 2822.2 | learning rate: 2.000E-04 | global batch size:   192 | lm loss: 3.517623E+00 | loss scale: 65536.0 | grad norm: 14373.250 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 10000 | lm loss value: 3.466655E+00 | lm loss PPL: 3.202943E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    10200/  152972 | consumed samples:       798240 | elapsed time per iteration (ms): 3229.5 | learning rate: 2.000E-04 | global batch size:   192 | lm loss: 3.506298E+00 | loss scale: 65536.0 | grad norm: 8617.276 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10400/  152972 | consumed samples:       842720 | elapsed time per iteration (ms): 3153.6 | learning rate: 2.000E-04 | global batch size:   224 | lm loss: 3.490300E+00 | loss scale: 131072.0 | grad norm: 15166.489 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration   10500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-09-27 23:09:16,854] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step10500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   10500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1556.54
 iteration    10600/  152972 | consumed samples:       887520 | elapsed time per iteration (ms): 3143.5 | learning rate: 2.000E-04 | global batch size:   224 | lm loss: 3.465656E+00 | loss scale: 131072.0 | grad norm: 18837.923 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    10800/  152972 | consumed samples:       932320 | elapsed time per iteration (ms): 3184.8 | learning rate: 2.000E-04 | global batch size:   224 | lm loss: 3.453867E+00 | loss scale: 131072.0 | grad norm: 17371.701 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11000/  152972 | consumed samples:       983360 | elapsed time per iteration (ms): 3442.0 | learning rate: 1.999E-04 | global batch size:   256 | lm loss: 3.432232E+00 | loss scale: 131072.0 | grad norm: 15859.118 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 11000 | lm loss value: 3.379507E+00 | lm loss PPL: 2.935629E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    11200/  152972 | consumed samples:      1034560 | elapsed time per iteration (ms): 3944.4 | learning rate: 1.999E-04 | global batch size:   256 | lm loss: 3.417679E+00 | loss scale: 131072.0 | grad norm: 16501.723 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11400/  152972 | consumed samples:      1088128 | elapsed time per iteration (ms): 3564.3 | learning rate: 1.999E-04 | global batch size:   288 | lm loss: 3.408159E+00 | loss scale: 131072.0 | grad norm: 15030.492 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11600/  152972 | consumed samples:      1145728 | elapsed time per iteration (ms): 3764.1 | learning rate: 1.999E-04 | global batch size:   288 | lm loss: 3.673109E+00 | loss scale: 32768.0 | grad norm: 8315.752 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    11800/  152972 | consumed samples:      1203680 | elapsed time per iteration (ms): 3772.1 | learning rate: 1.999E-04 | global batch size:   320 | lm loss: 3.480969E+00 | loss scale: 32768.0 | grad norm: 3638.537 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-28 00:40:19,548] [INFO] [logging.py:68:log_dist] [Rank 0] step=12000, skipped=24, lr=[0.00019989732423933654, 0.00019989732423933654], mom=[(0.9, 0.999), (0.9, 0.999)]
 iteration    12000/  152972 | consumed samples:      1267680 | elapsed time per iteration (ms): 4068.1 | learning rate: 1.999E-04 | global batch size:   320 | lm loss: 3.395316E+00 | loss scale: 32768.0 | grad norm: 3546.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
steps: 12000 loss: 3.3645 iter time (s): 0.002 samples/sec: 157205.384
-------------------------------------------------------------------------------------------------
 validation loss at iteration 12000 | lm loss value: 3.333246E+00 | lm loss PPL: 2.802916E+01 | 
-------------------------------------------------------------------------------------------------
saving checkpoint at iteration   12000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-09-28 00:42:11,836] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step12000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   12000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1483.97
 iteration    12200/  152972 | consumed samples:      1331680 | elapsed time per iteration (ms): 4631.9 | learning rate: 1.999E-04 | global batch size:   320 | lm loss: 3.371478E+00 | loss scale: 65536.0 | grad norm: 6871.269 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    12400/  152972 | consumed samples:      1401888 | elapsed time per iteration (ms): 4364.5 | learning rate: 1.999E-04 | global batch size:   352 | lm loss: 3.354478E+00 | loss scale: 65536.0 | grad norm: 7845.615 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    12600/  152972 | consumed samples:      1472768 | elapsed time per iteration (ms): 4401.1 | learning rate: 1.999E-04 | global batch size:   384 | lm loss: 3.341310E+00 | loss scale: 131072.0 | grad norm: 15380.704 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    12800/  152972 | consumed samples:      1549568 | elapsed time per iteration (ms): 4694.2 | learning rate: 1.998E-04 | global batch size:   384 | lm loss: 3.330164E+00 | loss scale: 131072.0 | grad norm: 14484.446 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    13000/  152972 | consumed samples:      1628544 | elapsed time per iteration (ms): 4806.2 | learning rate: 1.998E-04 | global batch size:   416 | lm loss: 3.313997E+00 | loss scale: 131072.0 | grad norm: 13669.468 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 13000 | lm loss value: 3.262029E+00 | lm loss PPL: 2.610244E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    13200/  152972 | consumed samples:      1711744 | elapsed time per iteration (ms): 5741.3 | learning rate: 1.998E-04 | global batch size:   416 | lm loss: 3.354557E+00 | loss scale: 262144.0 | grad norm: 58512.591 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    13400/  152972 | consumed samples:      1799680 | elapsed time per iteration (ms): 5249.3 | learning rate: 1.998E-04 | global batch size:   448 | lm loss: 3.296909E+00 | loss scale: 262144.0 | grad norm: 27755.587 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration   13500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-09-28 02:42:11,823] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step13500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   13500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1611.03
 iteration    13600/  152972 | consumed samples:      1890880 | elapsed time per iteration (ms): 5425.7 | learning rate: 1.997E-04 | global batch size:   480 | lm loss: 3.283506E+00 | loss scale: 524288.0 | grad norm: 46737.997 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    13800/  152972 | consumed samples:      1986880 | elapsed time per iteration (ms): 5637.3 | learning rate: 1.997E-04 | global batch size:   480 | lm loss: 3.271662E+00 | loss scale: 524288.0 | grad norm: 47777.880 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-28 03:29:51,917] [INFO] [logging.py:68:log_dist] [Rank 0] step=14000, skipped=25, lr=[0.00019968259658442148, 0.00019968259658442148], mom=[(0.9, 0.999), (0.9, 0.999)]
 iteration    14000/  152972 | consumed samples:      2088384 | elapsed time per iteration (ms): 5910.3 | learning rate: 1.997E-04 | global batch size:   512 | lm loss: 3.259530E+00 | loss scale: 524288.0 | grad norm: 52261.681 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
steps: 14000 loss: 3.2571 iter time (s): 0.003 samples/sec: 171761.365
-------------------------------------------------------------------------------------------------
 validation loss at iteration 14000 | lm loss value: 3.209220E+00 | lm loss PPL: 2.475976E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    14200/  152972 | consumed samples:      2190784 | elapsed time per iteration (ms): 6811.1 | learning rate: 1.996E-04 | global batch size:   512 | lm loss: 3.249312E+00 | loss scale: 524288.0 | grad norm: 54061.591 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    14400/  152972 | consumed samples:      2293184 | elapsed time per iteration (ms): 5947.2 | learning rate: 1.996E-04 | global batch size:   512 | lm loss: 3.241659E+00 | loss scale: 524288.0 | grad norm: 65493.217 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    14600/  152972 | consumed samples:      2395584 | elapsed time per iteration (ms): 5945.0 | learning rate: 1.996E-04 | global batch size:   512 | lm loss: 3.228454E+00 | loss scale: 524288.0 | grad norm: 54939.967 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    14800/  152972 | consumed samples:      2497984 | elapsed time per iteration (ms): 5944.7 | learning rate: 1.995E-04 | global batch size:   512 | lm loss: 3.224388E+00 | loss scale: 1048576.0 | grad norm: 121723.322 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    15000/  152972 | consumed samples:      2600384 | elapsed time per iteration (ms): 5941.6 | learning rate: 1.995E-04 | global batch size:   512 | lm loss: 3.219751E+00 | loss scale: 524288.0 | grad norm: 54512.204 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 15000 | lm loss value: 3.168109E+00 | lm loss PPL: 2.376250E+01 | 
-------------------------------------------------------------------------------------------------
saving checkpoint at iteration   15000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-09-28 05:14:43,931] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step15000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   15000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1524.30
 iteration    15200/  152972 | consumed samples:      2702784 | elapsed time per iteration (ms): 6832.5 | learning rate: 1.994E-04 | global batch size:   512 | lm loss: 3.211065E+00 | loss scale: 524288.0 | grad norm: 109272.684 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    15400/  152972 | consumed samples:      2805184 | elapsed time per iteration (ms): 5932.3 | learning rate: 1.994E-04 | global batch size:   512 | lm loss: 5.582409E+00 | loss scale: 8192.0 | grad norm: 5190.886 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    15600/  152972 | consumed samples:      2907584 | elapsed time per iteration (ms): 5934.8 | learning rate: 1.994E-04 | global batch size:   512 | lm loss: 3.609198E+00 | loss scale: 8192.0 | grad norm: 999.517 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    15800/  152972 | consumed samples:      3009984 | elapsed time per iteration (ms): 5938.9 | learning rate: 1.993E-04 | global batch size:   512 | lm loss: 3.244447E+00 | loss scale: 16384.0 | grad norm: 1567.686 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-28 06:53:45,446] [INFO] [logging.py:68:log_dist] [Rank 0] step=16000, skipped=35, lr=[0.00019925189380325714, 0.00019925189380325714], mom=[(0.9, 0.999), (0.9, 0.999)]
 iteration    16000/  152972 | consumed samples:      3112384 | elapsed time per iteration (ms): 5939.7 | learning rate: 1.993E-04 | global batch size:   512 | lm loss: 3.219918E+00 | loss scale: 16384.0 | grad norm: 1520.644 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
steps: 16000 loss: 3.2127 iter time (s): 0.003 samples/sec: 172380.960
-------------------------------------------------------------------------------------------------
 validation loss at iteration 16000 | lm loss value: 3.161706E+00 | lm loss PPL: 2.361084E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    16200/  152972 | consumed samples:      3214784 | elapsed time per iteration (ms): 6815.0 | learning rate: 1.992E-04 | global batch size:   512 | lm loss: 3.204937E+00 | loss scale: 16384.0 | grad norm: 1542.282 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    16400/  152972 | consumed samples:      3317184 | elapsed time per iteration (ms): 5943.6 | learning rate: 1.991E-04 | global batch size:   512 | lm loss: 3.192268E+00 | loss scale: 32768.0 | grad norm: 3691.928 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration   16500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-09-28 07:46:12,124] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step16500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   16500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1581.94
 iteration    16600/  152972 | consumed samples:      3419584 | elapsed time per iteration (ms): 5948.0 | learning rate: 1.991E-04 | global batch size:   512 | lm loss: 3.188400E+00 | loss scale: 32768.0 | grad norm: 3386.372 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    16800/  152972 | consumed samples:      3521984 | elapsed time per iteration (ms): 5938.0 | learning rate: 1.990E-04 | global batch size:   512 | lm loss: 3.179675E+00 | loss scale: 65536.0 | grad norm: 7297.389 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    17000/  152972 | consumed samples:      3624384 | elapsed time per iteration (ms): 5947.5 | learning rate: 1.990E-04 | global batch size:   512 | lm loss: 3.172127E+00 | loss scale: 65536.0 | grad norm: 7117.693 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 17000 | lm loss value: 3.123308E+00 | lm loss PPL: 2.272142E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    17200/  152972 | consumed samples:      3726784 | elapsed time per iteration (ms): 6818.1 | learning rate: 1.989E-04 | global batch size:   512 | lm loss: 3.166840E+00 | loss scale: 65536.0 | grad norm: 8196.015 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    17400/  152972 | consumed samples:      3829184 | elapsed time per iteration (ms): 5943.5 | learning rate: 1.988E-04 | global batch size:   512 | lm loss: 3.166609E+00 | loss scale: 131072.0 | grad norm: 13701.135 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    17600/  152972 | consumed samples:      3931584 | elapsed time per iteration (ms): 5948.1 | learning rate: 1.988E-04 | global batch size:   512 | lm loss: 3.547299E+00 | loss scale: 32768.0 | grad norm: 40597.156 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    17800/  152972 | consumed samples:      4033984 | elapsed time per iteration (ms): 5935.0 | learning rate: 1.987E-04 | global batch size:   512 | lm loss: 3.382232E+00 | loss scale: 32768.0 | grad norm: 3705.223 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-28 10:17:49,436] [INFO] [logging.py:68:log_dist] [Rank 0] step=18000, skipped=38, lr=[0.0001986378302594345, 0.0001986378302594345], mom=[(0.9, 0.999), (0.9, 0.999)]
 iteration    18000/  152972 | consumed samples:      4136384 | elapsed time per iteration (ms): 5983.1 | learning rate: 1.986E-04 | global batch size:   512 | lm loss: 3.181944E+00 | loss scale: 32768.0 | grad norm: 3197.646 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
steps: 18000 loss: 3.1506 iter time (s): 0.003 samples/sec: 171948.955
-------------------------------------------------------------------------------------------------
 validation loss at iteration 18000 | lm loss value: 3.119682E+00 | lm loss PPL: 2.263917E+01 | 
-------------------------------------------------------------------------------------------------
saving checkpoint at iteration   18000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-09-28 10:20:47,056] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step18000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   18000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1526.21
 iteration    18200/  152972 | consumed samples:      4238784 | elapsed time per iteration (ms): 6837.9 | learning rate: 1.986E-04 | global batch size:   512 | lm loss: 3.161476E+00 | loss scale: 65536.0 | grad norm: 6685.337 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    18400/  152972 | consumed samples:      4341184 | elapsed time per iteration (ms): 5946.0 | learning rate: 1.985E-04 | global batch size:   512 | lm loss: 3.155458E+00 | loss scale: 65536.0 | grad norm: 6264.391 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    18600/  152972 | consumed samples:      4443584 | elapsed time per iteration (ms): 5944.7 | learning rate: 1.984E-04 | global batch size:   512 | lm loss: 3.145045E+00 | loss scale: 131072.0 | grad norm: 12548.998 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    18800/  152972 | consumed samples:      4545984 | elapsed time per iteration (ms): 5942.5 | learning rate: 1.983E-04 | global batch size:   512 | lm loss: 3.142362E+00 | loss scale: 131072.0 | grad norm: 15242.462 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    19000/  152972 | consumed samples:      4648384 | elapsed time per iteration (ms): 5953.6 | learning rate: 1.983E-04 | global batch size:   512 | lm loss: 3.137035E+00 | loss scale: 131072.0 | grad norm: 13829.453 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 19000 | lm loss value: 3.082390E+00 | lm loss PPL: 2.181047E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    19200/  152972 | consumed samples:      4750784 | elapsed time per iteration (ms): 6815.5 | learning rate: 1.982E-04 | global batch size:   512 | lm loss: 3.132610E+00 | loss scale: 262144.0 | grad norm: 30657.020 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    19400/  152972 | consumed samples:      4853184 | elapsed time per iteration (ms): 5946.9 | learning rate: 1.981E-04 | global batch size:   512 | lm loss: 3.128786E+00 | loss scale: 262144.0 | grad norm: 31589.334 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration   19500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-09-28 12:52:22,656] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step19500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   19500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1441.76
 iteration    19600/  152972 | consumed samples:      4955584 | elapsed time per iteration (ms): 5955.7 | learning rate: 1.980E-04 | global batch size:   512 | lm loss: 3.120208E+00 | loss scale: 524288.0 | grad norm: 49876.549 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    19800/  152972 | consumed samples:      5057984 | elapsed time per iteration (ms): 5950.6 | learning rate: 1.979E-04 | global batch size:   512 | lm loss: 3.121297E+00 | loss scale: 524288.0 | grad norm: 53555.193 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-28 13:42:04,166] [INFO] [logging.py:68:log_dist] [Rank 0] step=20000, skipped=38, lr=[0.0001978414577067249, 0.0001978414577067249], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 20000 loss: 3.0782 iter time (s): 0.003 samples/sec: 172555.361
 iteration    20000/  152972 | consumed samples:      5160384 | elapsed time per iteration (ms): 5980.3 | learning rate: 1.978E-04 | global batch size:   512 | lm loss: 3.115301E+00 | loss scale: 524288.0 | grad norm: 56000.720 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 20000 | lm loss value: 3.064670E+00 | lm loss PPL: 2.142740E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    20200/  152972 | consumed samples:      5262784 | elapsed time per iteration (ms): 6830.4 | learning rate: 1.978E-04 | global batch size:   512 | lm loss: 3.113258E+00 | loss scale: 1048576.0 | grad norm: 103464.632 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    20400/  152972 | consumed samples:      5365184 | elapsed time per iteration (ms): 5953.6 | learning rate: 1.977E-04 | global batch size:   512 | lm loss: 3.105831E+00 | loss scale: 1048576.0 | grad norm: 108251.876 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    20600/  152972 | consumed samples:      5467584 | elapsed time per iteration (ms): 5970.4 | learning rate: 1.976E-04 | global batch size:   512 | lm loss: 3.102602E+00 | loss scale: 1048576.0 | grad norm: 103925.194 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration   20631 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-09-28 14:47:40,955] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step20631/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   20631 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1557.86
[exiting program after 1190.0755506277085 minutes] datetime: 2021-09-28 14:47:42 
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
2021-09-28 14:48:37.053595: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.091828: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.099385: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.099454: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.120919: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.142916: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.152322: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.154141: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.161484: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.183490: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.187576: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.188027: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.197254: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.197624: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.198213: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.198276: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.206128: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.209065: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.214555: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.214696: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.214803: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.214802: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.216779: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.228255: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.228338: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.235693: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.235792: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.235852: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.235849: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.263369: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.263470: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.263469: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.269847: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.269934: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.270028: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.271112: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.271128: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.271163: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.278477: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.280468: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.280662: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.280720: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.280834: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.280884: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.280895: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.281205: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.291633: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.293622: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.298292: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.301746: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.304821: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.314149: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.314767: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.319249: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.321252: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.333146: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.373811: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.373809: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.403829: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.403873: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.403922: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.433662: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.433918: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-28 14:48:37.491940: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
fused_adam --------------------------------------------------.............
 [93m[NO][0mop name  .......................  [92m[OKAY][0minstalled
 .. fused_lambcompatible 
.............-------------------------------------------------- 
[93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
fused_adam transformer.............  ............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
fused_lamb .............stochastic_transformer  [93m[NO][0m ........ [93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0mninja ......  [92m[OKAY][0m
.................. [92m[OKAY][0m
--------------------------------------------------
op name ................fused_adam  installed.............  [93m[NO][0m.. .......  compatible[92m[OKAY][0m

--------------------------------------------------
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0mfused_adam
 ............. transformer[93m[NO][0m  ............ .......[93m[NO][0m  [92m[OKAY][0m....... 
ninja[92m[OKAY][0m
 fused_lamb..................  .............stochastic_transformer[92m[OKAY][0m 
 [93m[NO][0m.--------------------------------------------------  
.......[93m[NO][0m  op name[92m[OKAY][0m....... 
 ................[92m[OKAY][0m 
installed .. compatible
--------------------------------------------------
sparse_attn ............ninja [93m[NO][0m cpu_adam .................. ....... ............... [92m[OKAY][0m [92m[OKAY][0m
[92m[YES][0m
 --------------------------------------------------......
transformer  op name[92m[OKAY][0m............ 
................  [93m[NO][0minstalled  ......... compatible fused_adam
[92m[OKAY][0m --------------------------------------------------
.............
 [93m[NO][0m stochastic_transformer.......  .[92m[OKAY][0m 
cpu_adam[93m[NO][0m  ...............fused_lamb.......   [92m[YES][0m[92m[OKAY][0m............. 
......  [93m[NO][0m[92m[OKAY][0m
 ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn fused_lamb............ .............  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............stochastic_transformer  [93m[NO][0m ........  [92m[OKAY][0m[93m[NO][0m
 ....... transformer[92m[OKAY][0m 
............ [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report
--------------------------------------------------

DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja
ninjaninja  ....................................  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

op nameop name  ................................  installedinstalled  ....  compatiblecompatible

----------------------------------------------------------------------------------------------------

cpu_adam cpu_adam...............  ...............[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
fused_adam ............. fused_adam[93m[NO][0m  ....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0mfused_lamb
 ............. [93m[NO][0mfused_lamb  ....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m .......sparse_attn  [92m[OKAY][0m............
 [93m[NO][0m transformer.......  ............[92m[OKAY][0m 
[93m[NO][0m .......transformer  [92m[OKAY][0m............
 [93m[NO][0m ....... stochastic_transformer[92m[OKAY][0m 
. [93m[NO][0m .......stochastic_transformer  [92m[OKAY][0m.
 [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc version nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ...................0.4.2+72ce55a, 72ce55a, big-science 
0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op report--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................sparse_attn  installed............  [93m[NO][0m .........  [92m[OKAY][0mcompatible

--------------------------------------------------transformer
 ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformercpu_adam .  [93m[NO][0m...............  .......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adamninja .............  [93m[NO][0m..................  .......[92m[OKAY][0m [92m[OKAY][0m

--------------------------------------------------
fused_lambop name  .............................  [93m[NO][0minstalled  .........  [92m[OKAY][0mcompatible

--------------------------------------------------
cpu_adam ...............sparse_attn  [92m[YES][0m............  ......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
transformer ............ [93m[NO][0m .......fused_adam  [92m[OKAY][0m.............
 [93m[NO][0m .......stochastic_transformer  [92m[OKAY][0m
. [93m[NO][0m fused_lamb.......  .............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------

DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
ninja .................. [92m[OKAY][0mninja
fused_adam -------------------------------------------------- ..................
.............  op name[92m[OKAY][0m[93m[NO][0m 
ninja................ -------------------------------------------------- ....... installed
  ..................[92m[OKAY][0mop name 
[92m[OKAY][0m..  
compatiblefused_lamb................
--------------------------------------------------  
--------------------------------------------------.............installedop name
   ..[93m[NO][0m................   compatible.......cpu_adaminstalled
   --------------------------------------------------[92m[OKAY][0m.................

  [92m[YES][0mcompatible 
......-------------------------------------------------- 
[92m[OKAY][0m
cpu_adam ............... [92m[YES][0m sparse_attn...... cpu_adamfused_adam ............   [92m[OKAY][0m...............[93m[NO][0m.............  
 [92m[YES][0m.......[93m[NO][0m   ......[92m[OKAY][0m....... 
 [92m[OKAY][0m[92m[OKAY][0m

transformer fused_adam............fused_lamb   .............[93m[NO][0m.............fused_adam    [93m[NO][0m............. .......[93m[NO][0m .......   [92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m.......
 
 .......[92m[OKAY][0m [92m[OKAY][0m
stochastic_transformer
 .fused_lamb fused_lamb[93m[NO][0m sparse_attn   .............................................    [93m[NO][0m[93m[NO][0m[93m[NO][0m[92m[OKAY][0m  
 .....................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


transformer ............ [93m[NO][0m ....... [92m[OKAY][0msparse_attn
sparse_attn stochastic_transformer............   ............[93m[NO][0m.   .......[93m[NO][0m [93m[NO][0m [92m[OKAY][0m 
.............. [92m[OKAY][0m 
[92m[OKAY][0mtransformer 
............ [93m[NO][0mtransformer  ....... ............[92m[OKAY][0m 
[93m[NO][0m ....... stochastic_transformer[92m[OKAY][0m 
. [93m[NO][0m .......stochastic_transformer  [92m[OKAY][0m.
 [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op nameninja ................  ..................installed  [92m[OKAY][0m..
 compatible--------------------------------------------------

--------------------------------------------------
op name ................ installed .. compatiblecpu_adam
 --------------------------------------------------...............
 [92m[YES][0m ...... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... fused_adam[92m[OKAY][0m 
............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adamfused_lamb  ..........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformersparse_attn  ........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

transformerstochastic_transformer  .............  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
using world size: 64, data-parallel-size: 4, tensor-model-parallel size: 4, pipeline-model-parallel size: 4 
using torch.float16 for parameters ...
------------------------ arguments ------------------------
  accumulate_allreduce_grads_in_fp32 .............. False
  adam_beta1 ...................................... 0.9
  adam_beta2 ...................................... 0.999
  adam_eps ........................................ 1e-08
  adlr_autoresume ................................. False
  adlr_autoresume_interval ........................ 1000
  apply_query_key_layer_scaling ................... True
  apply_residual_connection_post_layernorm ........ False
  attention_dropout ............................... 0.1
  attention_softmax_in_fp32 ....................... False
  bert_binary_head ................................ True
  bert_load ....................................... None
  bf16 ............................................ False
  bias_dropout_fusion ............................. True
  bias_gelu_fusion ................................ True
  biencoder_projection_dim ........................ 0
  biencoder_shared_query_context_model ............ False
  block_data_path ................................. None
  checkpoint_activations .......................... True
  checkpoint_in_cpu ............................... False
  checkpoint_num_layers ........................... 1
  clip_grad ....................................... 1.0
  codecarbon_dir .................................. None
  consumed_train_samples .......................... 0
  consumed_valid_samples .......................... 0
  contigious_checkpointing ........................ False
  cpu_optimizer ................................... False
  cpu_torch_adam .................................. False
  data_impl ....................................... mmap
  data_parallel_size .............................. 4
  data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document']
  dataloader_type ................................. single
  DDP_impl ........................................ local
  decoder_seq_length .............................. None
  deepscale ....................................... False
  deepscale_config ................................ None
  deepspeed ....................................... True
  deepspeed_activation_checkpointing .............. True
  deepspeed_config ................................ ./ds_config.1283386.json
  deepspeed_mpi ................................... False
  distribute_checkpointed_activations ............. False
  distributed_backend ............................. nccl
  embedding_path .................................. None
  encoder_seq_length .............................. 2048
  eod_mask_loss ................................... False
  eval_interval ................................... 1000
  eval_iters ...................................... 100
  evidence_data_path .............................. None
  exit_duration_in_mins ........................... 1190
  exit_interval ................................... None
  ffn_hidden_size ................................. 8192
  finetune ........................................ False
  fp16 ............................................ True
  fp16_lm_cross_entropy ........................... False
  fp32_residual_connection ........................ False
  gigaflos_no_embeds .............................. 0
  global_batch_size ............................... 512
  glu_activation .................................. None
  hidden_dropout .................................. 0.1
  hidden_size ..................................... 2048
  hysteresis ...................................... 2
  ict_head_size ................................... None
  ict_load ........................................ None
  img_dim ......................................... 224
  indexer_batch_size .............................. 128
  indexer_log_interval ............................ 1000
  init_method_std ................................. 0.02
  init_method_xavier_uniform ...................... False
  initial_loss_scale .............................. 4294967296
  kv_channels ..................................... 128
  layernorm_epsilon ............................... 1e-05
  lazy_mpu_init ................................... None
  load ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
  local_rank ...................................... 0
  log_batch_size_to_tensorboard ................... True
  log_interval .................................... 200
  log_learning_rate_to_tensorboard ................ True
  log_loss_scale_to_tensorboard ................... True
  log_num_zeros_in_grad ........................... False
  log_params_norm ................................. False
  log_timers_to_tensorboard ....................... True
  log_validation_ppl_to_tensorboard ............... True
  loss_on_targets_only ............................ False
  loss_scale ...................................... 12.0
  loss_scale_window ............................... 1000
  lr .............................................. 0.0002
  lr_decay_iters .................................. None
  lr_decay_samples ................................ 73242187
  lr_decay_style .................................. cosine
  lr_warmup_fraction .............................. None
  lr_warmup_iters ................................. 0
  lr_warmup_samples ............................... 183105
  make_vocab_size_divisible_by .................... 128
  mask_prob ....................................... 0.15
  masked_softmax_fusion ........................... True
  max_position_embeddings ......................... None
  memory_centric_tiled_linear ..................... False
  merge_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-merges.txt
  micro_batch_size ................................ 8
  min_loss_scale .................................. 1.0
  min_lr .......................................... 1e-05
  mmap_warmup ..................................... False
  no_load_optim ................................... None
  no_load_rng ..................................... None
  no_save_optim ................................... None
  no_save_rng ..................................... None
  num_attention_heads ............................. 16
  num_channels .................................... 3
  num_classes ..................................... 1000
  num_layers ...................................... 24
  num_layers_per_virtual_pipeline_stage ........... None
  num_workers ..................................... 2
  onnx_safe ....................................... None
  openai_gelu ..................................... False
  optimizer ....................................... adam
  override_lr_scheduler ........................... False
  params_dtype .................................... torch.float16
  partition_activations ........................... False
  patch_dim ....................................... 16
  pipeline_model_parallel_size .................... 4
  position_embedding_type ......................... PositionEmbeddingType.rotary
  profile_backward ................................ False
  query_in_block_prob ............................. 0.1
  rampup_batch_size ............................... ['32', '32', '2_000_000']
  rank ............................................ 0
  remote_device ................................... none
  reset_attention_mask ............................ False
  reset_position_ids .............................. False
  retriever_report_topk_accuracies ................ []
  retriever_score_scaling ......................... False
  retriever_seq_length ............................ 256
  sample_rate ..................................... 1.0
  save ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
  save_interval ................................... 1500
  scatter_gather_tensors_in_pipeline .............. True
  scattered_embeddings ............................ False
  seed ............................................ 1234
  seq_length ...................................... 2048
  sgd_momentum .................................... 0.9
  short_seq_prob .................................. 0.1
  split ........................................... 949,50,1
  split_transformers .............................. False
  synchronize_each_layer .......................... False
  tensor_model_parallel_size ...................... 4
  tensorboard_dir ................................. /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/tr4c-1B3-rotary-oscar-logs/tensorboard
  tensorboard_log_interval ........................ 1
  tensorboard_queue_size .......................... 5
  tile_factor ..................................... 1
  titles_data_path ................................ None
  tokenizer_name_or_path .......................... None
  tokenizer_type .................................. GPT2BPETokenizer
  train_iters ..................................... None
  train_samples ................................... 73242187
  use_checkpoint_lr_scheduler ..................... False
  use_contiguous_buffers_in_ddp ................... False
  use_cpu_initialization .......................... None
  use_one_sent_docs ............................... False
  use_pin_memory .................................. False
  virtual_pipeline_model_parallel_size ............ None
  vocab_extra_ids ................................. 0
  vocab_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-vocab.json
  weight_decay .................................... 0.1
  world_size ...................................... 64
  zero_allgather_bucket_size ...................... 0.0
  zero_contigious_gradients ....................... False
  zero_reduce_bucket_size ......................... 0.0
  zero_reduce_scatter ............................. False
  zero_stage ...................................... 1
-------------------- end of arguments ---------------------
will use batch size rampup starting from global batch size 32 to global batch size 512 with batch size increments 32 over 2000000 samples.
> building GPT2BPETokenizer tokenizer ...
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m async_io.......  [93m[NO][0m............... 
[93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... transformer_inference[92m[OKAY][0m 
.. [93m[NO][0m ....... [92m[OKAY][0mutils
 .................. [92m[YES][0m ...... utils[92m[OKAY][0m 
.................. [92m[YES][0m ......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m ....... quantizer[92m[OKAY][0m 
.............. [93m[NO][0m .......-------------------------------------------------- 
[92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference ..utils  [93m[NO][0m..................  .......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
utils quantizer..................  ..............[92m[YES][0m  [93m[NO][0m......  .......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer .............. --------------------------------------------------[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ...............transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [93m[NO][0m.......
 [92m[OKAY][0m
utilstransformer_inference  ....................  [92m[YES][0m[93m[NO][0m  .............  [92m[OKAY][0m[92m[OKAY][0m

quantizer ..............utils  [93m[NO][0m..................  .......[92m[YES][0m  [92m[OKAY][0m...... 
[92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op report--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
ninja ninja..................  [92m[OKAY][0m..................
 [92m[OKAY][0m--------------------------------------------------

--------------------------------------------------op name
 op name................  ................installed  installed..  ..compatible 
compatible--------------------------------------------------

--------------------------------------------------
cpu_adam ............... cpu_adam[92m[YES][0m  .....................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0mfused_adam  ....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
fused_lamb ............. fused_lamb[93m[NO][0m  ....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0msparse_attn  ...................  [92m[OKAY][0m[93m[NO][0m
 ....... transformer[92m[OKAY][0m 
............ [93m[NO][0mtransformer  ...................  [92m[OKAY][0m
[93m[NO][0m ....... [92m[OKAY][0mstochastic_transformer
 . [93m[NO][0m stochastic_transformer.......  .[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja ninja..................  [92m[OKAY][0m..................
 [92m[OKAY][0m--------------------------------------------------

--------------------------------------------------op name
 ................op name  installed................  ..installed  compatible..
 compatible--------------------------------------------------

--------------------------------------------------
cpu_adam cpu_adam...............  ...............[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
fused_adamfused_adam  ..........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

fused_lamb fused_lamb.............  .............[93m[NO][0m  [93m[NO][0m.......  [92m[OKAY][0m.......
 [92m[OKAY][0m
sparse_attn sparse_attn............  [93m[NO][0m............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
transformer ............transformer  [93m[NO][0m............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
stochastic_transformer .stochastic_transformer  [93m[NO][0m.  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninjaJIT compiled ops requires ninja


DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.utils ..................
 [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------async_io
 ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
ninjatransformer  ..............................  [92m[OKAY][0m[93m[NO][0m
 .......-------------------------------------------------- 
[92m[OKAY][0m
op namestochastic_transformer  .................  installed[93m[NO][0m  .. .......compatible 
[92m[OKAY][0m
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report--------------------------------------------------

----------------------------------------------------------------------------------------------------


JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja ninja..................  [92m[OKAY][0m..................
 [92m[OKAY][0m--------------------------------------------------

--------------------------------------------------op name
 ................op name  installed................  ..installed  compatible..
 compatible--------------------------------------------------

--------------------------------------------------
cpu_adam cpu_adam...............  ...............[92m[YES][0m  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
fused_adam fused_adam.............  .............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
fused_lambfused_lamb  ..........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

sparse_attnsparse_attn  ........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

transformertransformer  ........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

stochastic_transformerstochastic_transformer  ..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version ....................torch version  1.8.1....................
 1.8.1
torch cuda version torch cuda version...............  ...............11.1 
11.1
nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ninja.......ninja  [92m[OKAY][0m.................. 
..................  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

op nameop name  ................................  installedinstalled  ....  compatiblecompatible

----------------------------------------------------------------------------------------------------

cpu_adam cpu_adam...............  ...............[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0mninja 
[92m[OKAY][0m 
.................. [92m[OKAY][0m
--------------------------------------------------
fused_adamop name  fused_adam.............................   [93m[NO][0m.............installed   .......[93m[NO][0m..   [92m[OKAY][0mcompatible.......

 [92m[OKAY][0m--------------------------------------------------
fused_lamb
 ............. fused_lamb[93m[NO][0m  ....................  [93m[NO][0m[92m[OKAY][0m cpu_adam
.......  ...............[92m[OKAY][0m [92m[YES][0m
 ...... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m sparse_attn.......  fused_adam............ [92m[OKAY][0m .............
[93m[NO][0m  [93m[NO][0mtransformer.......   ...................[92m[OKAY][0m  
[92m[OKAY][0m[93m[NO][0m
 transformer.......  fused_lamb............[92m[OKAY][0m  
[93m[NO][0m.............  stochastic_transformer.......[93m[NO][0m   [92m[OKAY][0m........
  [92m[OKAY][0m[93m[NO][0m
stochastic_transformer  ........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
 > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688)
> initializing torch distributed ...
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version DeepSpeed general environment info:.................... 1.8.1

torch cuda version ............... torch install path11.1 
...............nvcc version  ..................... 11.2
deepspeed install path ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']........... 
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
torch versiondeepspeed info  .......................................  1.8.10.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.torch cuda version  .....................  torch 1.8, cuda 11.111.1

nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... async_io[93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
transformer_inference .. transformer_inference [93m[NO][0m.. .......  [93m[NO][0m [92m[OKAY][0m.......
 [92m[OKAY][0m
utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer .............. [93m[NO][0mquantizer  .....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
/bin/sh: line 0: type: git: not found
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w.DeepSpeed general environment info: ...... torch 1.8, cuda 11.1

torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


--------------------------------------------------
--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninja
--------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
ninja fused_lamb..................  .............[92m[OKAY][0m [93m[NO][0m
 .......-------------------------------------------------- 
[92m[OKAY][0m
op name ................ installed .. compatible
--------------------------------------------------
sparse_attn ............ [93m[NO][0m ....... cpu_adam[92m[OKAY][0m 
............... [92m[YES][0mtransformer  ..................  [92m[OKAY][0m[93m[NO][0m 
....... [92m[OKAY][0m
stochastic_transformer . fused_adam[93m[NO][0m  ....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
ninja sparse_attn..................  ............[92m[OKAY][0m 
[93m[NO][0m --------------------------------------------------.......
 [92m[OKAY][0mop name
 ................ installedtransformer  ..............  compatible[93m[NO][0m
 --------------------------------------------------.......
 [92m[OKAY][0m
stochastic_transformer .cpu_adam  [93m[NO][0m...............  .......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
ninjasparse_attn  ..............................  [92m[OKAY][0m[93m[NO][0m
 --------------------------------------------------.......
 [92m[OKAY][0mop name
 ................ installedtransformer ..  ............compatible 
[93m[NO][0m-------------------------------------------------- 
....... [92m[OKAY][0m
stochastic_transformercpu_adam  ................  [93m[NO][0m[92m[YES][0m  .............  [92m[OKAY][0m
[92m[OKAY][0m
fused_adamninja ............. [93m[NO][0m ....... [92m[OKAY][0m
 .................. [92m[OKAY][0mfused_lamb
 ............. [93m[NO][0m .......-------------------------------------------------- 
[92m[OKAY][0m
op name ................ installed ..sparse_attn  ............ [93m[NO][0m .......compatible 
[92m[OKAY][0m
--------------------------------------------------
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformercpu_adam  ................  [93m[NO][0m[92m[YES][0m  .............  [92m[OKAY][0m[92m[OKAY][0m

fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io ...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
async_ioasync_io ...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
torch version .................... 1.8.1
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m
[92m[OKAY][0m
torch cuda version ............... 11.1
utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
nvcc version ..................... 11.2
quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
----------------------------------------------------------------------------------------------------

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ...............async_io  [93m[NO][0m...............  .......[93m[NO][0m  [93m[NO][0m.......
 [93m[NO][0m
transformer_inference .. [93m[NO][0m .......transformer_inference  [92m[OKAY][0m..
 [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0mutils  ........................  [92m[OKAY][0m[92m[YES][0m
 ...... quantizer[92m[OKAY][0m 
.............. [93m[NO][0mquantizer .......  ..............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m--------------------------------------------------

--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m async_io.......  [93m[NO][0m...............
 [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m .......transformer_inference  [92m[OKAY][0m..
 [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m utils......  ..................[92m[OKAY][0m 
[92m[YES][0m ...... quantizer[92m[OKAY][0m 
.............. [93m[NO][0m .......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m ....... --------------------------------------------------[92m[OKAY][0m

--------------------------------------------------
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path
 ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version ....................['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 1.8.1

torch cuda versiontorch version  ...................................  11.11.8.1

nvcc version torch cuda version.....................  ...............11.2 
11.1deepspeed install path
 nvcc version...........  ..................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']11.2

deepspeed infodeepspeed install path  ..............................  0.4.2+72ce55a, 72ce55a, big-science
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed wheel compiled w.
 deepspeed info......  ...................torch 1.8, cuda 11.1 
0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.2
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ...................0.4.2+72ce55a, 72ce55a, big-science 
0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  torch 1.8, cuda 11.1......
 torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path 
............... torch install path['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 
............... torch version .................... 1.8.1
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']torch cuda version 
............... 11.1torch version
 nvcc version....................  .....................1.8.1 
11.2torch cuda version
 deepspeed install path...............  ...........11.1 
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']nvcc version
 .....................deepspeed info  11.2...................
 deepspeed install path0.4.2+72ce55a, 72ce55a, big-science 
...........deepspeed wheel compiled w.  ......['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
torch 1.8, cuda 11.1deepspeed info
 ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path 
............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version ....................['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 1.8.1

torch versiontorch cuda version  ...................................  1.8.111.1

nvcc versiontorch cuda version  ....................................  11.211.1

deepspeed install pathnvcc version  ................................  11.2['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed install pathdeepspeed info  ..............................  0.4.2+72ce55a, 72ce55a, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed wheel compiled w.deepspeed info  .........................  torch 1.8, cuda 11.10.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... utils[92m[OKAY][0m 
.................. [92m[YES][0m ...... [92m[OKAY][0mutils
 .................. [92m[YES][0m quantizer......  ..............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0mquantizer
 .............. [93m[NO][0m --------------------------------------------------.......
 [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ............... 
[93m[NO][0m ....... [93m[NO][0m
async_iotransformer_inference  .................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[92m[OKAY][0m

utils .................. transformer_inference[92m[YES][0m  ........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... utils[92m[OKAY][0m 
.................. [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc version nvcc version.....................  .....................11.2 
11.2
deepspeed install path deepspeed install path........... ...........  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
DeepSpeed general environment info:torch install path 
............... torch install path ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']...............
 torch version .................... 1.8.1['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch cuda version torch version...............  ....................11.1 
1.8.1nvcc version
 ..................... torch cuda version11.2 
...............deepspeed install path  11.1...........
 nvcc version['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
..................... deepspeed info11.2 
................... deepspeed install path0.4.2+72ce55a, 72ce55a, big-science 
...........deepspeed wheel compiled w.  ...... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']torch 1.8, cuda 11.1

deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:DeepSpeed general environment info:

torch install path torch install path...............  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version ....................torch version  1.8.1....................
 1.8.1torch cuda version
 ............... torch cuda version11.1 
...............nvcc version  11.1.....................
 nvcc version11.2 
..................... deepspeed install path11.2 
...........deepspeed install path  ...........['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
...................deepspeed info  0.4.2+72ce55a, 72ce55a, big-science...................
 deepspeed wheel compiled w.0.4.2+72ce55a, 72ce55a, big-science 
...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 
...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch version ....................torch version  1.8.1....................
 torch cuda version1.8.1 
...............torch cuda version  11.1...............
 nvcc version11.1 
..................... nvcc version11.2 
..................... deepspeed install path11.2 
........... deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']...........
 deepspeed info ...................['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed info
 deepspeed wheel compiled w....................  ......0.4.2+72ce55a, 72ce55a, big-science 
torch 1.8, cuda 11.1deepspeed wheel compiled w.
 ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

DeepSpeed C++/CUDA extension op report--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op nameninja ................  installed..................  ..[92m[OKAY][0m 
compatible
----------------------------------------------------------------------------------------------------

op name ................ installed .. compatible
cpu_adam --------------------------------------------------...............
 [92m[YES][0m ...... [92m[OKAY][0m
cpu_adam ............... [92m[YES][0m ...... fused_adam[92m[OKAY][0m 
............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adamfused_lamb  ..........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

fused_lamb ninja.............ninja   [93m[NO][0m.................................... sparse_attn  .......[92m[OKAY][0m[92m[OKAY][0m  
............[92m[OKAY][0m
-------------------------------------------------- 
--------------------------------------------------

[93m[NO][0mop nameop name   .......................................   [92m[OKAY][0minstalledinstalled 
sparse_attn ..  transformer............compatible..  compatible 
[93m[NO][0m
............ ---------------------------------------------------------------------------------------------------- .......
 
[93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0mtransformer
 cpu_adam............cpu_adam stochastic_transformer  .............................. [93m[NO][0m  .[92m[YES][0m [92m[YES][0m  [93m[NO][0m.............    .......[92m[OKAY][0m......[92m[OKAY][0m  
[92m[OKAY][0m
[92m[OKAY][0m

stochastic_transformer . [93m[NO][0m .......fused_adam  [92m[OKAY][0m.............fused_adam
  [93m[NO][0m.............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
fused_lamb ............. fused_lamb[93m[NO][0m  ....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
sparse_attn sparse_attn............  ............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m [92m[OKAY][0m

transformertransformer  ........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

stochastic_transformerstochastic_transformer  . .[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. .. 
[93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
async_io quantizer...............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[92m[OKAY][0m
--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report

--------------------------------------------------DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja----------------------------------------------------------------------------------------------------


JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0mninja
 .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed sparse_attnninja..  ............ compatibleninja ..................
[93m[NO][0m  -------------------------------------------------- .........................
[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------cpu_adamtransformer op name
 ............ ............... op name................ [93m[NO][0m  [92m[YES][0m .......................  installedinstalled ......  [92m[OKAY][0m ....
[92m[OKAY][0m  
compatiblestochastic_transformercompatible
 
--------------------------------------------------.
-------------------------------------------------- 
[93m[NO][0mfused_adam  ....................  [92m[OKAY][0m[93m[NO][0m
cpu_adam  cpu_adam......................   [92m[OKAY][0m
fused_lamb .............[92m[YES][0m ............... [93m[NO][0m......   .......[92m[YES][0m[92m[OKAY][0m  
[92m[OKAY][0m......
 [92m[OKAY][0m
fused_adam ............. [93m[NO][0m .......sparse_attn  ............[92m[OKAY][0mfused_adam 
[93m[NO][0m  ....................fused_lamb   [92m[OKAY][0m[93m[NO][0m.............
  [93m[NO][0m.......transformer  ....... ............ [92m[OKAY][0m[92m[OKAY][0m 
[93m[NO][0m
 ....... fused_lamb[92m[OKAY][0m 
............. [93m[NO][0m stochastic_transformer.......sparse_attn   [92m[OKAY][0m.............
  [93m[NO][0m[93m[NO][0m  .............. [92m[OKAY][0m 
[92m[OKAY][0m
sparse_attntransformer  ........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

transformer stochastic_transformer............  .[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninjaninja ..................  ..................[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
op name op name................  ................installed  installed..  ..compatible 
compatible--------------------------------------------------

--------------------------------------------------
ninja .................. [92m[OKAY][0mcpu_adamcpu_adam
  ..............................--------------------------------------------------  [92m[YES][0m
 [92m[YES][0m......  op name......[92m[OKAY][0m 
 [92m[OKAY][0m
................ installed .. compatible
--------------------------------------------------fused_adamfused_adam
  ..........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

cpu_adamfused_lambfused_lamb   .........................................   [93m[NO][0m[93m[NO][0m[92m[YES][0m   ....................  [92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0m
sparse_attnsparse_attnfused_adam  ............ ............  [93m[NO][0m............. [93m[NO][0m ....... [93m[NO][0m.......   [92m[OKAY][0m[92m[OKAY][0m
.......
 transformertransformer[92m[OKAY][0m  
........................  [93m[NO][0m[93m[NO][0m  fused_lamb..............   [92m[OKAY][0m[92m[OKAY][0m.............

 [93m[NO][0m stochastic_transformer.......stochastic_transformer   .[92m[OKAY][0m .[93m[NO][0m
  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... DeepSpeed general environment info:['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info
 ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. torch install path......  ...............torch 1.8, cuda 11.1 
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0mtransformer_inference  ........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0mquantizer
 .............. [93m[NO][0m ....... [92m[OKAY][0mutils
 .................. --------------------------------------------------
[92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils transformer_inference..................  .. [93m[NO][0m .......[92m[YES][0m  [92m[OKAY][0m...... [92m[OKAY][0m
quantizer .............. 
[93m[NO][0m ....... [92m[OKAY][0m
utils --------------------------------------------------
.................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m .......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m [93m[NO][0m.......  ....... [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install pathDeepSpeed general environment info: ............... 
torch install path['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 
............... torch version .................... 1.8.1
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch cuda version ...............torch version  11.1....................
 nvcc version1.8.1 
..................... 11.2torch cuda version
 ............... 11.1
nvcc version ..................... 11.2
deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed info deepspeed wheel compiled w....................  ......0.4.2+72ce55a, 72ce55a, big-science 
torch 1.8, cuda 11.1deepspeed wheel compiled w.
 ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
> setting tensorboard ...
> initializing tensor model parallel with size 4
> initializing pipeline model parallel with size 4
> setting random seeds to 1234 ...
[2021-09-28 14:48:43,402] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
> compiling dataset index builder ...
make: Entering directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data'
make: Nothing to be done for 'default'.
make: Leaving directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data'
>>> done with dataset index builder. Compilation time: 0.308 seconds
> compiling and loading fused kernels ...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_upper_triang_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_upper_triang_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module fused_mix_prec_layer_norm_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_mix_prec_layer_norm_cuda...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
>>> done with compiling and loading fused kernels. Compilation time: 18.336 seconds
time to initialize megatron (seconds): 62.048
[after megatron is initialized] datetime: 2021-09-28 14:49:02 
building GPT model ...
[2021-09-28 14:49:02,161] [INFO] [utils.py:680:see_memory_usage] Before Building Model
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved
  warnings.warn(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved
  warnings.warn(
[2021-09-28 14:49:02,162] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB         Max_MA 0.0 GB         CA 0.0 GB         Max_CA 0 GB 
[2021-09-28 14:49:02,162] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory:  used = 40.42 GB, percent = 21.6%
SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None
Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=1, data=0, model=0): 16, ProcessCoord(pipe=1, data=0, model=1): 17, ProcessCoord(pipe=1, data=0, model=2): 18, ProcessCoord(pipe=1, data=0, model=3): 19, ProcessCoord(pipe=1, data=1, model=0): 20, ProcessCoord(pipe=1, data=1, model=1): 21, ProcessCoord(pipe=1, data=1, model=2): 22, ProcessCoord(pipe=1, data=1, model=3): 23, ProcessCoord(pipe=1, data=2, model=0): 24, ProcessCoord(pipe=1, data=2, model=1): 25, ProcessCoord(pipe=1, data=2, model=2): 26, ProcessCoord(pipe=1, data=2, model=3): 27, ProcessCoord(pipe=1, data=3, model=0): 28, ProcessCoord(pipe=1, data=3, model=1): 29, ProcessCoord(pipe=1, data=3, model=2): 30, ProcessCoord(pipe=1, data=3, model=3): 31, ProcessCoord(pipe=2, data=0, model=0): 32, ProcessCoord(pipe=2, data=0, model=1): 33, ProcessCoord(pipe=2, data=0, model=2): 34, ProcessCoord(pipe=2, data=0, model=3): 35, ProcessCoord(pipe=2, data=1, model=0): 36, ProcessCoord(pipe=2, data=1, model=1): 37, ProcessCoord(pipe=2, data=1, model=2): 38, ProcessCoord(pipe=2, data=1, model=3): 39, ProcessCoord(pipe=2, data=2, model=0): 40, ProcessCoord(pipe=2, data=2, model=1): 41, ProcessCoord(pipe=2, data=2, model=2): 42, ProcessCoord(pipe=2, data=2, model=3): 43, ProcessCoord(pipe=2, data=3, model=0): 44, ProcessCoord(pipe=2, data=3, model=1): 45, ProcessCoord(pipe=2, data=3, model=2): 46, ProcessCoord(pipe=2, data=3, model=3): 47, ProcessCoord(pipe=3, data=0, model=0): 48, ProcessCoord(pipe=3, data=0, model=1): 49, ProcessCoord(pipe=3, data=0, model=2): 50, ProcessCoord(pipe=3, data=0, model=3): 51, ProcessCoord(pipe=3, data=1, model=0): 52, ProcessCoord(pipe=3, data=1, model=1): 53, ProcessCoord(pipe=3, data=1, model=2): 54, ProcessCoord(pipe=3, data=1, model=3): 55, ProcessCoord(pipe=3, data=2, model=0): 56, ProcessCoord(pipe=3, data=2, model=1): 57, ProcessCoord(pipe=3, data=2, model=2): 58, ProcessCoord(pipe=3, data=2, model=3): 59, ProcessCoord(pipe=3, data=3, model=0): 60, ProcessCoord(pipe=3, data=3, model=1): 61, ProcessCoord(pipe=3, data=3, model=2): 62, ProcessCoord(pipe=3, data=3, model=3): 63}
[2021-09-28 14:49:02,683] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer
stage=0 layers=9
     0: _to_float16
     1: EmbeddingPipe
     2: <lambda>
     3: ParallelTransformerLayerPipe
     4: ParallelTransformerLayerPipe
     5: ParallelTransformerLayerPipe
     6: ParallelTransformerLayerPipe
     7: ParallelTransformerLayerPipe
     8: ParallelTransformerLayerPipe
stage=1 layers=6
     9: ParallelTransformerLayerPipe
    10: ParallelTransformerLayerPipe
    11: ParallelTransformerLayerPipe
    12: ParallelTransformerLayerPipe
    13: ParallelTransformerLayerPipe
    14: ParallelTransformerLayerPipe
stage=2 layers=6
    15: ParallelTransformerLayerPipe
    16: ParallelTransformerLayerPipe
    17: ParallelTransformerLayerPipe
    18: ParallelTransformerLayerPipe
    19: ParallelTransformerLayerPipe
    20: ParallelTransformerLayerPipe
stage=3 layers=10
    21: ParallelTransformerLayerPipe
    22: ParallelTransformerLayerPipe
    23: ParallelTransformerLayerPipe
    24: ParallelTransformerLayerPipe
    25: ParallelTransformerLayerPipe
    26: ParallelTransformerLayerPipe
    27: <lambda>
    28: MixedFusedLayerNorm
    29: EmbeddingPipe
    30: float16_to_fp32
  loss: CrossEntropy
 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 101549056
 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 101549056
 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 101549056
 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 101549056
 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 101544960 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 101544960

 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 101544960
[2021-09-28 14:49:03,018] [INFO] [utils.py:680:see_memory_usage] After Building Model
[2021-09-28 14:49:03,019] [INFO] [utils.py:681:see_memory_usage] MA 0.2 GB         Max_MA 0.22 GB         CA 0.24 GB         Max_CA 0 GB 
[2021-09-28 14:49:03,019] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory:  used = 40.8 GB, percent = 21.8%
 > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 101544960
setting training iterations to 152972
> learning rate decay style: cosine
DeepSpeed is enabled.
[2021-09-28 14:49:03,038] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+72ce55a, git-hash=72ce55a, git-branch=big-science
[2021-09-28 14:49:03,105] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False
[2021-09-28 14:49:03,105] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer
[2021-09-28 14:49:03,105] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer
[2021-09-28 14:49:03,105] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam
[2021-09-28 14:49:03,105] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type=<class 'apex.optimizers.fused_adam.FusedAdam'>
[2021-09-28 14:49:03,105] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer
[2021-09-28 14:49:03,106] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000
[2021-09-28 14:49:03,106] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000
[2021-09-28 14:49:03,106] [INFO] [stage2.py:108:__init__] CPU Offload: False
[2021-09-28 14:49:03,106] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False
[2021-09-28 14:49:03,353] [INFO] [stage2.py:419:__init__] optimizer state initialized
[2021-09-28 14:49:03,353] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam
[2021-09-28 14:49:03,353] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler
[2021-09-28 14:49:03,353] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = <megatron.learning_rates.AnnealingLR object at 0x148e34751b80>
[2021-09-28 14:49:03,353] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)]
[2021-09-28 14:49:03,353] [INFO] [config.py:900:print] DeepSpeedEngine configuration:
[2021-09-28 14:49:03,354] [INFO] [config.py:904:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2021-09-28 14:49:03,354] [INFO] [config.py:904:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2021-09-28 14:49:03,354] [INFO] [config.py:904:print]   allreduce_always_fp32 ........ False
[2021-09-28 14:49:03,354] [INFO] [config.py:904:print]   amp_enabled .................. False
[2021-09-28 14:49:03,354] [INFO] [config.py:904:print]   amp_params ................... False
[2021-09-28 14:49:03,354] [INFO] [config.py:904:print]   checkpoint_tag_validation_enabled  True
[2021-09-28 14:49:03,354] [INFO] [config.py:904:print]   checkpoint_tag_validation_fail  False
[2021-09-28 14:49:03,354] [INFO] [config.py:904:print]   disable_allgather ............ False
[2021-09-28 14:49:03,354] [INFO] [config.py:904:print]   dump_state ................... False
[2021-09-28 14:49:03,354] [INFO] [config.py:904:print]   dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1}
[2021-09-28 14:49:03,354] [INFO] [config.py:904:print]   eigenvalue_enabled ........... False
[2021-09-28 14:49:03,354] [INFO] [config.py:904:print]   eigenvalue_gas_boundary_resolution  1
[2021-09-28 14:49:03,354] [INFO] [config.py:904:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2021-09-28 14:49:03,354] [INFO] [config.py:904:print]   eigenvalue_layer_num ......... 0
[2021-09-28 14:49:03,354] [INFO] [config.py:904:print]   eigenvalue_max_iter .......... 100
[2021-09-28 14:49:03,354] [INFO] [config.py:904:print]   eigenvalue_stability ......... 1e-06
[2021-09-28 14:49:03,354] [INFO] [config.py:904:print]   eigenvalue_tol ............... 0.01
[2021-09-28 14:49:03,354] [INFO] [config.py:904:print]   eigenvalue_verbose ........... False
[2021-09-28 14:49:03,354] [INFO] [config.py:904:print]   elasticity_enabled ........... False
[2021-09-28 14:49:03,354] [INFO] [config.py:904:print]   flops_profiler_config ........ {
    "enabled": false, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2021-09-28 14:49:03,354] [INFO] [config.py:904:print]   fp16_enabled ................. True
[2021-09-28 14:49:03,355] [INFO] [config.py:904:print]   fp16_mixed_quantize .......... False
[2021-09-28 14:49:03,355] [INFO] [config.py:904:print]   global_rank .................. 0
[2021-09-28 14:49:03,355] [INFO] [config.py:904:print]   gradient_accumulation_steps .. 16
[2021-09-28 14:49:03,355] [INFO] [config.py:904:print]   gradient_clipping ............ 1.0
[2021-09-28 14:49:03,355] [INFO] [config.py:904:print]   gradient_predivide_factor .... 1.0
[2021-09-28 14:49:03,355] [INFO] [config.py:904:print]   initial_dynamic_scale ........ 4096
[2021-09-28 14:49:03,355] [INFO] [config.py:904:print]   loss_scale ................... 0
[2021-09-28 14:49:03,355] [INFO] [config.py:904:print]   memory_breakdown ............. False
[2021-09-28 14:49:03,355] [INFO] [config.py:904:print]   optimizer_legacy_fusion ...... False
[2021-09-28 14:49:03,355] [INFO] [config.py:904:print]   optimizer_name ............... None
[2021-09-28 14:49:03,355] [INFO] [config.py:904:print]   optimizer_params ............. None
[2021-09-28 14:49:03,355] [INFO] [config.py:904:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2021-09-28 14:49:03,355] [INFO] [config.py:904:print]   pld_enabled .................. False
[2021-09-28 14:49:03,355] [INFO] [config.py:904:print]   pld_params ................... False
[2021-09-28 14:49:03,355] [INFO] [config.py:904:print]   prescale_gradients ........... False
[2021-09-28 14:49:03,355] [INFO] [config.py:904:print]   quantize_change_rate ......... 0.001
[2021-09-28 14:49:03,355] [INFO] [config.py:904:print]   quantize_groups .............. 1
[2021-09-28 14:49:03,355] [INFO] [config.py:904:print]   quantize_offset .............. 1000
[2021-09-28 14:49:03,355] [INFO] [config.py:904:print]   quantize_period .............. 1000
[2021-09-28 14:49:03,355] [INFO] [config.py:904:print]   quantize_rounding ............ 0
[2021-09-28 14:49:03,355] [INFO] [config.py:904:print]   quantize_start_bits .......... 16
[2021-09-28 14:49:03,355] [INFO] [config.py:904:print]   quantize_target_bits ......... 8
[2021-09-28 14:49:03,355] [INFO] [config.py:904:print]   quantize_training_enabled .... False
[2021-09-28 14:49:03,355] [INFO] [config.py:904:print]   quantize_type ................ 0
[2021-09-28 14:49:03,355] [INFO] [config.py:904:print]   quantize_verbose ............. False
[2021-09-28 14:49:03,355] [INFO] [config.py:904:print]   scheduler_name ............... None
[2021-09-28 14:49:03,355] [INFO] [config.py:904:print]   scheduler_params ............. None
[2021-09-28 14:49:03,355] [INFO] [config.py:904:print]   sparse_attention ............. None
[2021-09-28 14:49:03,355] [INFO] [config.py:904:print]   sparse_gradients_enabled ..... False
[2021-09-28 14:49:03,355] [INFO] [config.py:904:print]   steps_per_print .............. 2000
[2021-09-28 14:49:03,355] [INFO] [config.py:904:print]   tensorboard_enabled .......... False
[2021-09-28 14:49:03,355] [INFO] [config.py:904:print]   tensorboard_job_name ......... DeepSpeedJobName
[2021-09-28 14:49:03,355] [INFO] [config.py:904:print]   tensorboard_output_path ...... 
[2021-09-28 14:49:03,355] [INFO] [config.py:904:print]   train_batch_size ............. 512
[2021-09-28 14:49:03,355] [INFO] [config.py:904:print]   train_micro_batch_size_per_gpu  8
[2021-09-28 14:49:03,355] [INFO] [config.py:904:print]   use_quantizer_kernel ......... False
[2021-09-28 14:49:03,355] [INFO] [config.py:904:print]   wall_clock_breakdown ......... False
[2021-09-28 14:49:03,355] [INFO] [config.py:904:print]   world_size ................... 4
[2021-09-28 14:49:03,355] [INFO] [config.py:904:print]   zero_allow_untested_optimizer  False
[2021-09-28 14:49:03,356] [INFO] [config.py:904:print]   zero_config .................. {
    "stage": 1, 
    "contiguous_gradients": false, 
    "reduce_scatter": true, 
    "reduce_bucket_size": 5.000000e+08, 
    "allgather_partitions": true, 
    "allgather_bucket_size": 5.000000e+08, 
    "overlap_comm": false, 
    "load_from_fp32_weights": true, 
    "elastic_checkpoint": true, 
    "offload_param": null, 
    "offload_optimizer": null, 
    "sub_group_size": 1.000000e+09, 
    "prefetch_bucket_size": 5.000000e+07, 
    "param_persistence_threshold": 1.000000e+05, 
    "max_live_parameters": 1.000000e+09, 
    "max_reuse_distance": 1.000000e+09, 
    "gather_fp16_weights_on_model_save": false, 
    "ignore_unused_parameters": true, 
    "round_robin_gradients": false, 
    "legacy_stage1": false
}
[2021-09-28 14:49:03,356] [INFO] [config.py:904:print]   zero_enabled ................. True
[2021-09-28 14:49:03,356] [INFO] [config.py:904:print]   zero_optimization_stage ...... 1
[2021-09-28 14:49:03,356] [INFO] [config.py:906:print]   json = {
    "train_micro_batch_size_per_gpu": 8, 
    "train_batch_size": 512, 
    "gradient_clipping": 1.0, 
    "zero_optimization": {
        "stage": 1
    }, 
    "fp16": {
        "enabled": true, 
        "loss_scale": 0, 
        "loss_scale_window": 500, 
        "hysteresis": 2, 
        "min_loss_scale": 1, 
        "initial_scale_power": 12
    }, 
    "steps_per_print": 2.000000e+03, 
    "wall_clock_breakdown": false
}
[2021-09-28 14:49:03,356] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=16 micro_batch_size=8
[2021-09-28 14:49:03,644] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-28 14:49:03,644] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-28 14:49:03,644] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-28 14:49:03,644] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-28 14:49:03,644] [INFO] [engine.py:134:__init__] RANK=33 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-28 14:49:03,644] [INFO] [engine.py:134:__init__] RANK=32 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-28 14:49:03,644] [INFO] [engine.py:134:__init__] RANK=35 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-28 14:49:03,644] [INFO] [engine.py:134:__init__] RANK=50 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-28 14:49:03,644] [INFO] [engine.py:134:__init__] RANK=49 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-28 14:49:03,644] [INFO] [engine.py:134:__init__] RANK=48 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-28 14:49:03,644] [INFO] [engine.py:134:__init__] RANK=51 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-28 14:49:03,644] [INFO] [engine.py:134:__init__] RANK=34 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-28 14:49:03,644] [INFO] [engine.py:134:__init__] RANK=16 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-28 14:49:03,644] [INFO] [engine.py:134:__init__] RANK=17 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-28 14:49:03,644] [INFO] [engine.py:134:__init__] RANK=19 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-28 14:49:03,644] [INFO] [engine.py:134:__init__] RANK=18 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
 > using checkpoint value 0.0002 for learning rate
 > using checkpoint value 1e-05 for minimum learning rate
 > using checkpoint value 183105 for warmup iterations
 > using checkpoint value 73242187 for total number of iterations
 > using checkpoint value cosine for decay style
successfully loaded 4 ZeRO state_dicts for rank 38
successfully loaded 4 ZeRO state_dicts for rank 34
successfully loaded 4 ZeRO state_dicts for rank 41
successfully loaded 4 ZeRO state_dicts for rank 33
successfully loaded 4 ZeRO state_dicts for rank 25
successfully loaded 4 ZeRO state_dicts for rank 46
successfully loaded 4 ZeRO state_dicts for rank 29
successfully loaded 4 ZeRO state_dicts for rank 44
successfully loaded 4 ZeRO state_dicts for rank 40
successfully loaded 4 ZeRO state_dicts for rank 36
successfully loaded 4 ZeRO state_dicts for rank 32
successfully loaded 4 ZeRO state_dicts for rank 28
successfully loaded 4 ZeRO state_dicts for rank 37
successfully loaded 4 ZeRO state_dicts for rank 20
successfully loaded 4 ZeRO state_dicts for rank 24
successfully loaded 4 ZeRO state_dicts for rank 45
successfully loaded 4 ZeRO state_dicts for rank 26
successfully loaded 4 ZeRO state_dicts for rank 42
successfully loaded 4 ZeRO state_dicts for rank 16
successfully loaded 4 ZeRO state_dicts for rank 22
successfully loaded 4 ZeRO state_dicts for rank 18
successfully loaded 4 ZeRO state_dicts for rank 23
successfully loaded 4 ZeRO state_dicts for rank 31
successfully loaded 4 ZeRO state_dicts for rank 0
successfully loaded 4 ZeRO state_dicts for rank 27
successfully loaded 4 ZeRO state_dicts for rank 19
successfully loaded 4 ZeRO state_dicts for rank 39
successfully loaded 4 ZeRO state_dicts for rank 43
successfully loaded 4 ZeRO state_dicts for rank 12
successfully loaded 4 ZeRO state_dicts for rank 3
successfully loaded 4 ZeRO state_dicts for rank 21
successfully loaded 4 ZeRO state_dicts for rank 17
successfully loaded 4 ZeRO state_dicts for rank 30
successfully loaded 4 ZeRO state_dicts for rank 35
successfully loaded 4 ZeRO state_dicts for rank 47
successfully loaded 4 ZeRO state_dicts for rank 15
successfully loaded 4 ZeRO state_dicts for rank 8
successfully loaded 4 ZeRO state_dicts for rank 49
successfully loaded 4 ZeRO state_dicts for rank 62
successfully loaded 4 ZeRO state_dicts for rank 50
successfully loaded 4 ZeRO state_dicts for rank 4
successfully loaded 4 ZeRO state_dicts for rank 58
successfully loaded 4 ZeRO state_dicts for rank 5
successfully loaded 4 ZeRO state_dicts for rank 7
successfully loaded 4 ZeRO state_dicts for rank 57
successfully loaded 4 ZeRO state_dicts for rank 53
successfully loaded 4 ZeRO state_dicts for rank 6
successfully loaded 4 ZeRO state_dicts for rank 60
successfully loaded 4 ZeRO state_dicts for rank 48
successfully loaded 4 ZeRO state_dicts for rank 54
successfully loaded 4 ZeRO state_dicts for rank 52
loading 4 zero partition checkpoints for rank 38
successfully loaded 4 ZeRO state_dicts for rank 11
successfully loaded 4 ZeRO state_dicts for rank 2
successfully loaded 4 ZeRO state_dicts for rank 14
successfully loaded 4 ZeRO state_dicts for rank 10
successfully loaded 4 ZeRO state_dicts for rank 13
loading 4 zero partition checkpoints for rank 41
successfully loaded 4 ZeRO state_dicts for rank 61
loading 4 zero partition checkpoints for rank 34
successfully loaded 4 ZeRO state_dicts for rank 59
successfully loaded 4 ZeRO state_dicts for rank 63
successfully loaded 4 ZeRO state_dicts for rank 56
loading 4 zero partition checkpoints for rank 25
successfully loaded 4 ZeRO state_dicts for rank 55
loading 4 zero partition checkpoints for rank 33
loading 4 zero partition checkpoints for rank 46
loading 4 zero partition checkpoints for rank 29
loading 4 zero partition checkpoints for rank 36
loading 4 zero partition checkpoints for rank 44
successfully loaded 4 ZeRO state_dicts for rank 1
successfully loaded 4 ZeRO state_dicts for rank 9
loading 4 zero partition checkpoints for rank 20
loading 4 zero partition checkpoints for rank 40
loading 4 zero partition checkpoints for rank 37
loading 4 zero partition checkpoints for rank 45
loading 4 zero partition checkpoints for rank 32
loading 4 zero partition checkpoints for rank 28
successfully loaded 4 ZeRO state_dicts for rank 51
loading 4 zero partition checkpoints for rank 24
loading 4 zero partition checkpoints for rank 26
loading 4 zero partition checkpoints for rank 18
loading 4 zero partition checkpoints for rank 42
loading 4 zero partition checkpoints for rank 16
loading 4 zero partition checkpoints for rank 22
loading 4 zero partition checkpoints for rank 27
loading 4 zero partition checkpoints for rank 23
loading 4 zero partition checkpoints for rank 31
loading 4 zero partition checkpoints for rank 19
loading 4 zero partition checkpoints for rank 39
loading 4 zero partition checkpoints for rank 35
loading 4 zero partition checkpoints for rank 43
loading 4 zero partition checkpoints for rank 30
loading 4 zero partition checkpoints for rank 47
loading 4 zero partition checkpoints for rank 21
loading 4 zero partition checkpoints for rank 17
loading 4 zero partition checkpoints for rank 0
 checkpoint version 3.0
loading 4 zero partition checkpoints for rank 3
loading 4 zero partition checkpoints for rank 12
loading 4 zero partition checkpoints for rank 8
loading 4 zero partition checkpoints for rank 4
loading 4 zero partition checkpoints for rank 49
loading 4 zero partition checkpoints for rank 62
loading 4 zero partition checkpoints for rank 15
loading 4 zero partition checkpoints for rank 7
loading 4 zero partition checkpoints for rank 50
loading 4 zero partition checkpoints for rank 5
loading 4 zero partition checkpoints for rank 53
loading 4 zero partition checkpoints for rank 58
loading 4 zero partition checkpoints for rank 54
loading 4 zero partition checkpoints for rank 6
loading 4 zero partition checkpoints for rank 57
loading 4 zero partition checkpoints for rank 14
loading 4 zero partition checkpoints for rank 48
loading 4 zero partition checkpoints for rank 60
loading 4 zero partition checkpoints for rank 52
loading 4 zero partition checkpoints for rank 2
loading 4 zero partition checkpoints for rank 11
loading 4 zero partition checkpoints for rank 13
loading 4 zero partition checkpoints for rank 10
loading 4 zero partition checkpoints for rank 56
loading 4 zero partition checkpoints for rank 59
loading 4 zero partition checkpoints for rank 61
loading 4 zero partition checkpoints for rank 63
loading 4 zero partition checkpoints for rank 55
loading 4 zero partition checkpoints for rank 1
loading 4 zero partition checkpoints for rank 51
loading 4 zero partition checkpoints for rank 9
  successfully loaded checkpoint from /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints at iteration 20631
time (ms) | load-checkpoint: 1891.18
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")

/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.624784896
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264

estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264estimated model parameters: 1.209483264

estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.624784896
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.624784896
estimated model parameters: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters: 1.62471936
estimated model parameters: 1.62471936
estimated model parameters: 1.62471936
estimated model parameters: 1.62471936
estimated model parameters: 1.62471936estimated model parameters: 1.62471936

/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.62471936
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.62471936
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters: 1.62471936
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.624784896
estimated model parameters: 1.62471936
estimated model parameters: 1.62471936
estimated model parameters: 1.62471936
estimated model parameters: 1.624784896estimated model parameters: 1.624784896
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.62471936

estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.62471936
estimated model parameters: 1.62471936estimated model parameters: 1.62471936

estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488

estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
[after model, optimizer, and learning rate scheduler are built] datetime: 2021-09-28 14:49:05 
> building train, validation, and test datasets ...
 > datasets target sizes (minimum size):
    train:      73242187
    validation: 7833600
    test:       51200
> building train, validation, and test datasets for GPT ...
 > building dataset index ...
    reading sizes...
    reading pointers...
    reading document index...
    creating numpy buffer of mmap...
    creating memory view of numpy buffer...
 > finished creating indexed dataset in 0.023376 seconds
    number of documents: 304230423
 > dataset split:
    train:
     document indices in [0, 288714672) total of 288714672 documents
    validation:
     document indices in [288714672, 303926193) total of 15211521 documents
    test:
     document indices in [303926193, 304230423) total of 304230 documents
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.122 seconds
    total number of samples: 131537224
    total number of epochs: 1
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.190 seconds
    total number of samples: 13854322
    total number of epochs: 2
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.047 seconds
    total number of samples: 137384
    total number of epochs: 1
> finished creating GPT datasets ...
[after dataloaders are built] datetime: 2021-09-28 14:49:10 
done with setup ...
training ...
time (ms) | model-and-optimizer-setup: 3582.86 | train/valid/test-data-iterators-setup: 4142.39
Number of parameters: 1.624784896 billion
Number of parameters: 1.62471936 billionNumber of parameters: 1.62471936 billion

Number of parameters: 1.624784896 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion

Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.62471936 billion

Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion


Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion

Number of parameters: 1.62471936 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters: 1.62471936 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billionNumber of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.62471936 billionNumber of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion


Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.209483264 billion

Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billionNumber of parameters without embeddings: 1.2095488 billion

Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
[before the start of training step] datetime: 2021-09-28 14:49:10 
[2021-09-28 14:49:10,552] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information
[2021-09-28 14:49:10,552] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False
[2021-09-28 14:49:10,552] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 24 total layers
[2021-09-28 14:49:10,552] [INFO] [checkpointing.py:415:forward] ----Synchronization False
[2021-09-28 14:49:10,552] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False
[Rank 17] (after 20800 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4588.0 | max reserved: 4588.0
[Rank 33] (after 20800 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4348.0 | max reserved: 4348.0
[Rank 16] (after 20800 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4748.0 | max reserved: 4748.0
[Rank 1] (after 20800 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.95751953125 | reserved: 5494.0 | max reserved: 5494.0
[Rank 32] (after 20800 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4364.0 | max reserved: 4364.0
[Rank 49] (after 20800 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7310.0 | max reserved: 7310.0
[Rank 48] (after 20800 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7326.0 | max reserved: 7326.0
[Rank 0] (after 20800 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.95751953125 | reserved: 5430.0 | max reserved: 5430.0
[Rank 18] (after 20800 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4716.0 | max reserved: 4716.0
[Rank 34] (after 20800 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4316.0 | max reserved: 4316.0
[Rank 2] (after 20800 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.95751953125 | reserved: 5478.0 | max reserved: 5478.0
[Rank 50] (after 20800 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7230.0 | max reserved: 7230.0
[Rank 19] (after 20800 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4716.0 | max reserved: 4716.0
[Rank 3] (after 20800 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.95751953125 | reserved: 5494.0 | max reserved: 5494.0
[Rank 35] (after 20800 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4428.0 | max reserved: 4428.0
[Rank 51] (after 20800 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7310.0 | max reserved: 7310.0
 iteration    20800/  152972 | consumed samples:      5569984 | elapsed time per iteration (ms): 6006.8 | learning rate: 1.975E-04 | global batch size:   512 | lm loss: 3.081904E+00 | loss scale: 1048576.0 | grad norm: 85206.087 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    21000/  152972 | consumed samples:      5672384 | elapsed time per iteration (ms): 5941.9 | learning rate: 1.974E-04 | global batch size:   512 | lm loss: 3.071931E+00 | loss scale: 1048576.0 | grad norm: 95789.278 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 21000 | lm loss value: 3.026921E+00 | lm loss PPL: 2.063360E+01 | 
-------------------------------------------------------------------------------------------------
saving checkpoint at iteration   21000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-09-28 15:28:46,768] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step21000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   21000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1517.85
 iteration    21200/  152972 | consumed samples:      5774784 | elapsed time per iteration (ms): 6804.8 | learning rate: 1.973E-04 | global batch size:   512 | lm loss: 3.074067E+00 | loss scale: 1048576.0 | grad norm: 99803.868 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    21400/  152972 | consumed samples:      5877184 | elapsed time per iteration (ms): 5943.5 | learning rate: 1.972E-04 | global batch size:   512 | lm loss: 3.076500E+00 | loss scale: 2097152.0 | grad norm: 227891.390 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    21600/  152972 | consumed samples:      5979584 | elapsed time per iteration (ms): 5957.0 | learning rate: 1.971E-04 | global batch size:   512 | lm loss: 3.074863E+00 | loss scale: 524288.0 | grad norm: 47091.506 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    21800/  152972 | consumed samples:      6081984 | elapsed time per iteration (ms): 5946.9 | learning rate: 1.970E-04 | global batch size:   512 | lm loss: 3.078544E+00 | loss scale: 524288.0 | grad norm: 57398.595 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-28 17:07:54,704] [INFO] [logging.py:68:log_dist] [Rank 0] step=22000, skipped=44, lr=[0.0001968683020822059, 0.0001968683020822059], mom=[(0.9, 0.999), (0.9, 0.999)]
 iteration    22000/  152972 | consumed samples:      6184384 | elapsed time per iteration (ms): 5951.3 | learning rate: 1.969E-04 | global batch size:   512 | lm loss: 3.076009E+00 | loss scale: 524288.0 | grad norm: 51889.752 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
steps: 22000 loss: 3.0801 iter time (s): 0.003 samples/sec: 171204.482
-------------------------------------------------------------------------------------------------
 validation loss at iteration 22000 | lm loss value: 3.029455E+00 | lm loss PPL: 2.068596E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    22200/  152972 | consumed samples:      6286784 | elapsed time per iteration (ms): 6844.1 | learning rate: 1.968E-04 | global batch size:   512 | lm loss: 3.077078E+00 | loss scale: 1048576.0 | grad norm: 111055.087 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    22400/  152972 | consumed samples:      6389184 | elapsed time per iteration (ms): 5942.7 | learning rate: 1.967E-04 | global batch size:   512 | lm loss: 3.075747E+00 | loss scale: 262144.0 | grad norm: 26888.623 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration   22500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-09-28 18:00:26,661] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step22500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   22500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1540.99
 iteration    22600/  152972 | consumed samples:      6491584 | elapsed time per iteration (ms): 5949.3 | learning rate: 1.965E-04 | global batch size:   512 | lm loss: 3.075193E+00 | loss scale: 262144.0 | grad norm: 24802.973 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    22800/  152972 | consumed samples:      6593984 | elapsed time per iteration (ms): 5946.6 | learning rate: 1.964E-04 | global batch size:   512 | lm loss: 3.075609E+00 | loss scale: 524288.0 | grad norm: 62861.591 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    23000/  152972 | consumed samples:      6696384 | elapsed time per iteration (ms): 5938.8 | learning rate: 1.963E-04 | global batch size:   512 | lm loss: 3.469135E+00 | loss scale: 65536.0 | grad norm: 7574.517 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 23000 | lm loss value: 3.073435E+00 | lm loss PPL: 2.161602E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    23200/  152972 | consumed samples:      6798784 | elapsed time per iteration (ms): 6816.4 | learning rate: 1.962E-04 | global batch size:   512 | lm loss: 3.103661E+00 | loss scale: 65536.0 | grad norm: 7229.159 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    23400/  152972 | consumed samples:      6901184 | elapsed time per iteration (ms): 5953.2 | learning rate: 1.961E-04 | global batch size:   512 | lm loss: 3.083948E+00 | loss scale: 131072.0 | grad norm: 13699.842 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    23600/  152972 | consumed samples:      7003584 | elapsed time per iteration (ms): 5961.6 | learning rate: 1.960E-04 | global batch size:   512 | lm loss: 3.072135E+00 | loss scale: 131072.0 | grad norm: 12480.866 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    23800/  152972 | consumed samples:      7105984 | elapsed time per iteration (ms): 5976.2 | learning rate: 1.958E-04 | global batch size:   512 | lm loss: 3.070117E+00 | loss scale: 131072.0 | grad norm: 12700.193 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-28 20:32:11,624] [INFO] [logging.py:68:log_dist] [Rank 0] step=24000, skipped=51, lr=[0.0001957187411128351, 0.0001957187411128351], mom=[(0.9, 0.999), (0.9, 0.999)]
 iteration    24000/  152972 | consumed samples:      7208384 | elapsed time per iteration (ms): 5955.7 | learning rate: 1.957E-04 | global batch size:   512 | lm loss: 3.065704E+00 | loss scale: 262144.0 | grad norm: 28862.058 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
steps: 24000 loss: 3.0597 iter time (s): 0.003 samples/sec: 172059.520
-------------------------------------------------------------------------------------------------
 validation loss at iteration 24000 | lm loss value: 3.012676E+00 | lm loss PPL: 2.034177E+01 | 
-------------------------------------------------------------------------------------------------
saving checkpoint at iteration   24000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-09-28 20:35:04,941] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step24000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   24000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1552.77
 iteration    24200/  152972 | consumed samples:      7310784 | elapsed time per iteration (ms): 6808.7 | learning rate: 1.956E-04 | global batch size:   512 | lm loss: 3.057561E+00 | loss scale: 262144.0 | grad norm: 26372.832 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    24400/  152972 | consumed samples:      7413184 | elapsed time per iteration (ms): 5943.4 | learning rate: 1.955E-04 | global batch size:   512 | lm loss: 3.058038E+00 | loss scale: 524288.0 | grad norm: 57991.819 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    24600/  152972 | consumed samples:      7515584 | elapsed time per iteration (ms): 5946.9 | learning rate: 1.953E-04 | global batch size:   512 | lm loss: 3.055022E+00 | loss scale: 524288.0 | grad norm: 63715.162 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    24800/  152972 | consumed samples:      7617984 | elapsed time per iteration (ms): 5944.8 | learning rate: 1.952E-04 | global batch size:   512 | lm loss: 3.050838E+00 | loss scale: 524288.0 | grad norm: 57048.679 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    25000/  152972 | consumed samples:      7720384 | elapsed time per iteration (ms): 5943.1 | learning rate: 1.951E-04 | global batch size:   512 | lm loss: 3.051694E+00 | loss scale: 1048576.0 | grad norm: 102955.314 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 25000 | lm loss value: 3.004129E+00 | lm loss PPL: 2.016864E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    25200/  152972 | consumed samples:      7822784 | elapsed time per iteration (ms): 6811.8 | learning rate: 1.949E-04 | global batch size:   512 | lm loss: 3.051702E+00 | loss scale: 1048576.0 | grad norm: 116512.089 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    25400/  152972 | consumed samples:      7925184 | elapsed time per iteration (ms): 5938.4 | learning rate: 1.948E-04 | global batch size:   512 | lm loss: 3.046103E+00 | loss scale: 1048576.0 | grad norm: 107437.051 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration   25500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-09-28 23:06:33,969] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step25500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   25500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1746.37
 iteration    25600/  152972 | consumed samples:      8027584 | elapsed time per iteration (ms): 5956.0 | learning rate: 1.947E-04 | global batch size:   512 | lm loss: 3.046467E+00 | loss scale: 1048576.0 | grad norm: 131530.809 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    25800/  152972 | consumed samples:      8129984 | elapsed time per iteration (ms): 5962.9 | learning rate: 1.945E-04 | global batch size:   512 | lm loss: 3.042575E+00 | loss scale: 1048576.0 | grad norm: 110603.040 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-28 23:56:13,334] [INFO] [logging.py:68:log_dist] [Rank 0] step=26000, skipped=54, lr=[0.0001943917127426917, 0.0001943917127426917], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 26000 loss: 3.0654 iter time (s): 0.003 samples/sec: 171708.222
 iteration    26000/  152972 | consumed samples:      8232384 | elapsed time per iteration (ms): 5952.7 | learning rate: 1.944E-04 | global batch size:   512 | lm loss: 3.040515E+00 | loss scale: 524288.0 | grad norm: 54404.856 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 26000 | lm loss value: 2.993362E+00 | lm loss PPL: 1.995266E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    26200/  152972 | consumed samples:      8334784 | elapsed time per iteration (ms): 6820.7 | learning rate: 1.942E-04 | global batch size:   512 | lm loss: 3.042284E+00 | loss scale: 262144.0 | grad norm: 28784.417 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    26400/  152972 | consumed samples:      8437184 | elapsed time per iteration (ms): 5943.6 | learning rate: 1.941E-04 | global batch size:   512 | lm loss: 3.096729E+00 | loss scale: 65536.0 | grad norm: 71857.153 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    26600/  152972 | consumed samples:      8539584 | elapsed time per iteration (ms): 5950.9 | learning rate: 1.940E-04 | global batch size:   512 | lm loss: 3.222694E+00 | loss scale: 65536.0 | grad norm: 6616.510 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    26800/  152972 | consumed samples:      8641984 | elapsed time per iteration (ms): 5947.2 | learning rate: 1.938E-04 | global batch size:   512 | lm loss: 3.055728E+00 | loss scale: 65536.0 | grad norm: 6705.649 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    27000/  152972 | consumed samples:      8744384 | elapsed time per iteration (ms): 5961.8 | learning rate: 1.937E-04 | global batch size:   512 | lm loss: 3.039523E+00 | loss scale: 131072.0 | grad norm: 14053.338 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 27000 | lm loss value: 2.990162E+00 | lm loss PPL: 1.988890E+01 | 
-------------------------------------------------------------------------------------------------
saving checkpoint at iteration   27000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-09-29 01:41:13,231] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step27000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   27000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1659.82
 iteration    27200/  152972 | consumed samples:      8846784 | elapsed time per iteration (ms): 6836.3 | learning rate: 1.935E-04 | global batch size:   512 | lm loss: 3.037899E+00 | loss scale: 131072.0 | grad norm: 13469.919 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    27400/  152972 | consumed samples:      8949184 | elapsed time per iteration (ms): 5955.1 | learning rate: 1.934E-04 | global batch size:   512 | lm loss: 3.029759E+00 | loss scale: 262144.0 | grad norm: 27862.052 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    27600/  152972 | consumed samples:      9051584 | elapsed time per iteration (ms): 5956.2 | learning rate: 1.932E-04 | global batch size:   512 | lm loss: 3.028688E+00 | loss scale: 262144.0 | grad norm: 26683.435 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    27800/  152972 | consumed samples:      9153984 | elapsed time per iteration (ms): 5954.4 | learning rate: 1.930E-04 | global batch size:   512 | lm loss: 3.028962E+00 | loss scale: 262144.0 | grad norm: 28512.051 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-29 03:20:30,877] [INFO] [logging.py:68:log_dist] [Rank 0] step=28000, skipped=57, lr=[0.0001928919118506926, 0.0001928919118506926], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 28000 loss: 2.9725 iter time (s): 0.003 samples/sec: 171993.154
 iteration    28000/  152972 | consumed samples:      9256384 | elapsed time per iteration (ms): 5961.5 | learning rate: 1.929E-04 | global batch size:   512 | lm loss: 3.026538E+00 | loss scale: 524288.0 | grad norm: 52854.122 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 28000 | lm loss value: 2.977758E+00 | lm loss PPL: 1.964373E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    28200/  152972 | consumed samples:      9358784 | elapsed time per iteration (ms): 6834.0 | learning rate: 1.927E-04 | global batch size:   512 | lm loss: 3.021742E+00 | loss scale: 524288.0 | grad norm: 53422.078 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    28400/  152972 | consumed samples:      9461184 | elapsed time per iteration (ms): 5955.7 | learning rate: 1.926E-04 | global batch size:   512 | lm loss: 3.021417E+00 | loss scale: 1048576.0 | grad norm: 103346.836 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration   28500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-09-29 04:13:04,596] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step28500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   28500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1677.89
 iteration    28600/  152972 | consumed samples:      9563584 | elapsed time per iteration (ms): 5959.7 | learning rate: 1.924E-04 | global batch size:   512 | lm loss: 3.018751E+00 | loss scale: 1048576.0 | grad norm: 115044.752 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    28800/  152972 | consumed samples:      9665984 | elapsed time per iteration (ms): 5955.0 | learning rate: 1.922E-04 | global batch size:   512 | lm loss: 3.018619E+00 | loss scale: 1048576.0 | grad norm: 119606.128 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    29000/  152972 | consumed samples:      9768384 | elapsed time per iteration (ms): 5947.8 | learning rate: 1.921E-04 | global batch size:   512 | lm loss: 3.024303E+00 | loss scale: 131072.0 | grad norm: 17674.261 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 29000 | lm loss value: 2.992133E+00 | lm loss PPL: 1.992815E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    29200/  152972 | consumed samples:      9870784 | elapsed time per iteration (ms): 6814.3 | learning rate: 1.919E-04 | global batch size:   512 | lm loss: 3.019525E+00 | loss scale: 131072.0 | grad norm: 13215.564 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    29400/  152972 | consumed samples:      9973184 | elapsed time per iteration (ms): 5937.3 | learning rate: 1.917E-04 | global batch size:   512 | lm loss: 3.013686E+00 | loss scale: 131072.0 | grad norm: 13123.741 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    29600/  152972 | consumed samples:     10075584 | elapsed time per iteration (ms): 5941.9 | learning rate: 1.916E-04 | global batch size:   512 | lm loss: 3.012725E+00 | loss scale: 262144.0 | grad norm: 27310.862 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    29800/  152972 | consumed samples:     10177984 | elapsed time per iteration (ms): 5941.5 | learning rate: 1.914E-04 | global batch size:   512 | lm loss: 3.009491E+00 | loss scale: 262144.0 | grad norm: 24081.932 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-29 06:44:36,534] [INFO] [logging.py:68:log_dist] [Rank 0] step=30000, skipped=62, lr=[0.0001912239933021946, 0.0001912239933021946], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 30000 loss: 3.0179 iter time (s): 0.003 samples/sec: 171915.340
 iteration    30000/  152972 | consumed samples:     10280384 | elapsed time per iteration (ms): 5941.2 | learning rate: 1.912E-04 | global batch size:   512 | lm loss: 3.009170E+00 | loss scale: 524288.0 | grad norm: 53657.097 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 30000 | lm loss value: 2.961604E+00 | lm loss PPL: 1.932894E+01 | 
-------------------------------------------------------------------------------------------------
saving checkpoint at iteration   30000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-09-29 06:47:30,106] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step30000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   30000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1619.29
 iteration    30200/  152972 | consumed samples:     10382784 | elapsed time per iteration (ms): 6817.6 | learning rate: 1.910E-04 | global batch size:   512 | lm loss: 3.006530E+00 | loss scale: 524288.0 | grad norm: 56035.559 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    30400/  152972 | consumed samples:     10485184 | elapsed time per iteration (ms): 5948.7 | learning rate: 1.909E-04 | global batch size:   512 | lm loss: 3.004212E+00 | loss scale: 524288.0 | grad norm: 52717.999 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    30600/  152972 | consumed samples:     10587584 | elapsed time per iteration (ms): 5949.1 | learning rate: 1.907E-04 | global batch size:   512 | lm loss: 3.003795E+00 | loss scale: 1048576.0 | grad norm: 95509.063 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    30800/  152972 | consumed samples:     10689984 | elapsed time per iteration (ms): 5937.0 | learning rate: 1.905E-04 | global batch size:   512 | lm loss: 3.168708E+00 | loss scale: 16384.0 | grad norm: 1928.683 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    31000/  152972 | consumed samples:     10792384 | elapsed time per iteration (ms): 5935.4 | learning rate: 1.903E-04 | global batch size:   512 | lm loss: 3.018010E+00 | loss scale: 16384.0 | grad norm: 1423.369 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 31000 | lm loss value: 2.952177E+00 | lm loss PPL: 1.914759E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    31200/  152972 | consumed samples:     10894784 | elapsed time per iteration (ms): 6821.2 | learning rate: 1.901E-04 | global batch size:   512 | lm loss: 3.006021E+00 | loss scale: 32768.0 | grad norm: 3073.029 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    31400/  152972 | consumed samples:     10997184 | elapsed time per iteration (ms): 5933.6 | learning rate: 1.900E-04 | global batch size:   512 | lm loss: 3.001093E+00 | loss scale: 32768.0 | grad norm: 3306.054 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration   31500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-09-29 09:18:59,700] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step31500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   31500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1581.54
 iteration    31600/  152972 | consumed samples:     11099584 | elapsed time per iteration (ms): 5942.5 | learning rate: 1.898E-04 | global batch size:   512 | lm loss: 2.997809E+00 | loss scale: 32768.0 | grad norm: 3361.305 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    31800/  152972 | consumed samples:     11201984 | elapsed time per iteration (ms): 5950.9 | learning rate: 1.896E-04 | global batch size:   512 | lm loss: 2.991640E+00 | loss scale: 65536.0 | grad norm: 6164.983 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-29 10:08:32,355] [INFO] [logging.py:68:log_dist] [Rank 0] step=32000, skipped=70, lr=[0.0001893926396264795, 0.0001893926396264795], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 32000 loss: 3.0029 iter time (s): 0.003 samples/sec: 172539.903
 iteration    32000/  152972 | consumed samples:     11304384 | elapsed time per iteration (ms): 5943.2 | learning rate: 1.894E-04 | global batch size:   512 | lm loss: 2.991167E+00 | loss scale: 65536.0 | grad norm: 6676.767 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 32000 | lm loss value: 2.946570E+00 | lm loss PPL: 1.904053E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    32200/  152972 | consumed samples:     11406784 | elapsed time per iteration (ms): 6826.7 | learning rate: 1.892E-04 | global batch size:   512 | lm loss: 2.990183E+00 | loss scale: 131072.0 | grad norm: 13444.830 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration   32268 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-09-29 10:38:02,123] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step32268/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   32268 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1589.37
[exiting program after 1190.0289231856664 minutes] datetime: 2021-09-29 10:38:03 
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
2021-09-29 10:38:19.847281: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:19.847319: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:19.847549: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:19.847562: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:20.653390: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:20.653392: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:20.653408: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:20.653400: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:20.786051: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:20.786057: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:20.963784: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:20.963792: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:20.963790: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:20.963795: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:20.964418: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:20.964415: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:20.964434: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:20.964427: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:20.968232: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:20.968231: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:20.968233: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:20.968231: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:20.971578: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:20.971584: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:20.971587: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:20.971584: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:21.005301: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:21.005310: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:21.005310: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:21.005308: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:21.010574: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:21.010572: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:21.010580: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:21.010577: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:21.011774: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:21.011774: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:21.011788: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:21.011786: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:21.012857: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:21.012859: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:21.012862: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:21.012863: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:21.030657: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:21.030654: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:21.030664: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:21.030664: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:21.032468: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:21.032464: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:21.032465: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:21.032470: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:21.060735: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:21.060732: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:21.060735: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:21.060734: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:21.096049: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:21.096054: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:21.096045: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:21.096058: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:21.153295: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:21.153295: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:21.153309: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:21.153305: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:21.492979: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-29 10:38:21.522969: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
ninjaninja  .................. ..................[92m[OKAY][0m 
[92m[OKAY][0m--------------------------------------------------

--------------------------------------------------op name 
................ op nameinstalled  ..................  compatibleinstalled
 --------------------------------------------------
.. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... cpu_adam[92m[OKAY][0m 
............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m .......fused_adam [92m[OKAY][0m 
............. [93m[NO][0m fused_lamb.......  .............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0mfused_lamb
 ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m .......sparse_attn [92m[OKAY][0m 
............ [93m[NO][0mtransformer  ...................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
transformer ............ [93m[NO][0mstochastic_transformer  ........  [92m[OKAY][0m[93m[NO][0m 
....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report

--------------------------------------------------DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report--------------------------------------------------


JIT compiled ops requires ninja--------------------------------------------------DeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

------------------------------------------------------------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------


op nameop nameop nameop name    ................................................ ................   installedinstalledinstalledinstalled    ...... ..  compatible compatible
compatiblecompatible
--------------------------------------------------

--------------------------------------------------
----------------------------------------------------------------------------------------------------


cpu_adamcpu_adamcpu_adam  cpu_adam ............... .............................. [92m[YES][0m ...............  [92m[YES][0m [92m[YES][0m......  ............[92m[YES][0m [92m[OKAY][0m 
  [92m[OKAY][0m[92m[OKAY][0m......

 [92m[OKAY][0m
fused_adam .............fused_adam fused_adam [93m[NO][0m  .............fused_adam....................    [93m[NO][0m .............[93m[NO][0m.......[92m[OKAY][0m   
[93m[NO][0m.......[92m[OKAY][0m  fused_lamb
[92m[OKAY][0m....... 
 .............fused_lamb[92m[OKAY][0m fused_lamb [93m[NO][0m
 .................... fused_lamb .............  [92m[OKAY][0m .............[93m[NO][0m
[93m[NO][0m   [93m[NO][0m.......  ..............[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m

sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attnsparse_attnsparse_attn transformer   ................................................    [93m[NO][0m[93m[NO][0m[93m[NO][0m[93m[NO][0m    ............................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


transformertransformer transformer............stochastic_transformer   [93m[NO][0m ............ ............ ........ [93m[NO][0m   [93m[NO][0m[93m[NO][0m[92m[OKAY][0m 
..............   .......[92m[OKAY][0mstochastic_transformer[92m[OKAY][0m  
[92m[OKAY][0m
.
 [93m[NO][0mstochastic_transformerstochastic_transformer   ....... ..[92m[OKAY][0m  
[93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

ninjaninja  ninja....................................   ..................[92m[OKAY][0m[92m[OKAY][0m 
ninja
[92m[OKAY][0m-------------------------------------------------- 

..................----------------------------------------------------------------------------------------------------op name  

[92m[OKAY][0mop name................
  op name................installed--------------------------------------------------  
installed................  .... op name  installedcompatible compatible 

..................----------------------------------------------------------------------------------------------------  
installedcompatible
 
..-------------------------------------------------- 
compatible
--------------------------------------------------cpu_adam
 cpu_adam...............  ...............[92m[YES][0mcpu_adam  [92m[YES][0m ......cpu_adam ............... ...... [92m[OKAY][0m...............
   [92m[YES][0m[92m[OKAY][0m[92m[YES][0m 
 ............  [92m[OKAY][0m[92m[OKAY][0m
fused_adam
 ............. [93m[NO][0mfused_adam  ....................  [92m[OKAY][0m[93m[NO][0m
fused_adam fused_adam ....... fused_lamb ..........................  [92m[OKAY][0m
 .............[93m[NO][0m[93m[NO][0m fused_lamb  [93m[NO][0m....... ....................    .......[92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m 
 [92m[OKAY][0m

....... fused_lamb[92m[OKAY][0m 
fused_lamb.............  .............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
sparse_attn[92m[OKAY][0m 
sparse_attn............  ............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m sparse_attn
[92m[OKAY][0msparse_attn
  ........................transformer transformer  ............[93m[NO][0m[93m[NO][0m    ............[93m[NO][0m.......  .......[93m[NO][0m ....... [92m[OKAY][0m  [92m[OKAY][0m[92m[OKAY][0m
.......

 [92m[OKAY][0mtransformer
transformer stochastic_transformer ............ stochastic_transformer............  .[93m[NO][0m .  [93m[NO][0m [93m[NO][0m .......[93m[NO][0m ....... .......  [92m[OKAY][0m.......[92m[OKAY][0m  

[92m[OKAY][0m[92m[OKAY][0m

stochastic_transformer stochastic_transformer.  .[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
JIT compiled ops requires ninja

--------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------

DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report
--------------------------------------------------


----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------DeepSpeed C++/CUDA extension op report
--------------------------------------------------


DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------DeepSpeed C++/CUDA extension op report


----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


JIT compiled ops requires ninjaJIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja   .................. .................................... .................. [92m[OKAY][0m 
 [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m--------------------------------------------------


--------------------------------------------------op name--------------------------------------------------
-------------------------------------------------- 
op name
ninjaninjaninjaninja   .................. .................................... .................. [92m[OKAY][0m  [92m[OKAY][0m
[92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
................op name op name................    installed................installed ..................    installed..compatibleinstalled 
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
op name
  --------------------------------------------------compatible..
..
 -------------------------------------------------- compatible
compatible

--------------------------------------------------
op name op name op name................ ................  ................ ................installed installed   installed..installed ..   ..compatible..compatible
  
--------------------------------------------------compatible--------------------------------------------------
compatible


----------------------------------------------------------------------------------------------------

--------------------------------------------------
cpu_adam ...............cpu_adam  cpu_adam[92m[YES][0m...............cpu_adam   ...... [92m[YES][0m..............................   [92m[YES][0m......[92m[YES][0m   ...... [92m[OKAY][0m[92m[OKAY][0m 
......
[92m[OKAY][0m 
[92m[OKAY][0m
cpu_adam ............... cpu_adam[92m[YES][0m  .....................  cpu_adam[92m[YES][0m[92m[OKAY][0mcpu_adam  
...... ............... ............... [92m[OKAY][0m [92m[YES][0m
fused_adamfused_adam fused_adam.............  .............fused_adam[93m[NO][0m    ....................[93m[NO][0m ............. [92m[OKAY][0m  [93m[NO][0m
[92m[YES][0m  ............fused_adam  [92m[OKAY][0m [92m[OKAY][0m
.............
 [93m[NO][0mfused_adam  ....................  [92m[OKAY][0m[93m[NO][0m
.......[93m[NO][0m  [92m[OKAY][0m.......fused_lamb 
  .......[92m[OKAY][0m............. fused_lamb
[92m[OKAY][0m  
 fused_adam.......fused_adam  fused_lamb [92m[OKAY][0m.......................... 
.............[93m[NO][0mfused_lamb   [93m[NO][0m....................   .......[92m[OKAY][0m[93m[NO][0mfused_lamb
   [92m[OKAY][0m.................... 
 ............. [93m[NO][0m[93m[NO][0m   fused_lamb[93m[NO][0m....... ....... ............. .......   [92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m [92m[OKAY][0m.......

 
 [92m[OKAY][0m
[93m[NO][0m ....... [92m[OKAY][0msparse_attn
[92m[OKAY][0mfused_lambfused_lamb
 sparse_attn............  ............[93m[NO][0m  .......[93m[NO][0msparse_attn   [92m[OKAY][0m...................
  [92m[OKAY][0m[93m[NO][0m
 transformer.......  transformer............[92m[OKAY][0m  
  ..........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0msparse_attn[92m[OKAY][0m
 
............[93m[NO][0m  [93m[NO][0m.......sparse_attntransformer    .......[92m[OKAY][0m ............
[92m[OKAY][0m 
............ sparse_attn[93m[NO][0m  ...................  [92m[OKAY][0m[93m[NO][0m
 ....... sparse_attn[92m[OKAY][0mtransformersparse_attn 
............[93m[NO][0m  stochastic_transformer[93m[NO][0m .......stochastic_transformer  ........  [92m[OKAY][0m.[93m[NO][0m
  ........................transformer............    [93m[NO][0m[93m[NO][0m............[93m[NO][0m    [93m[NO][0m..............  .......[92m[OKAY][0m .......
 [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m


   .......[93m[NO][0m[92m[OKAY][0m 
 stochastic_transformer[92m[OKAY][0m....... 
 .transformer [92m[OKAY][0m[93m[NO][0m
 .......  [92m[OKAY][0m............
 [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer transformertransformer.   stochastic_transformer............ ............[93m[NO][0m.    [93m[NO][0m[93m[NO][0m  [93m[NO][0m.....................    [92m[OKAY][0m.......[92m[OKAY][0m[92m[OKAY][0m
 

[92m[OKAY][0m
stochastic_transformerstochastic_transformer  ..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

ninjaninjaninja ninja  ......................................................    ..................[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m 
[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

--------------------------------------------------
op name--------------------------------------------------
op name 
 ................op nameop name................    installed................................installed   .. installed..installed   compatible compatible..
..
--------------------------------------------------  --------------------------------------------------
compatiblecompatible


----------------------------------------------------------------------------------------------------

cpu_adam cpu_adam...............  ...............[92m[YES][0m  [92m[YES][0m......  cpu_adam[92m[OKAY][0mcpu_adam...... 
  ...............[92m[OKAY][0m...............
  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0mfused_adam[92m[OKAY][0m
 
fused_adam.............  .............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m fused_adamfused_adam[92m[OKAY][0m 
.............
  [93m[NO][0m.............  fused_lambfused_lamb[93m[NO][0m.......  ............. .............   .......[93m[NO][0m[93m[NO][0m[92m[OKAY][0m  
.......  .......[92m[OKAY][0mfused_lamb[92m[OKAY][0m 
 
[92m[OKAY][0m.............
 [93m[NO][0mfused_lamb  ....................  [92m[OKAY][0m[93m[NO][0m
 .......sparse_attn  [92m[OKAY][0msparse_attn............
  ............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m sparse_attn
[92m[OKAY][0m 
transformer ............sparse_attntransformer............   [93m[NO][0m  [93m[NO][0m...............................    .......[92m[OKAY][0m[93m[NO][0m[93m[NO][0m 
  .......[92m[OKAY][0m.......stochastic_transformer  
 [92m[OKAY][0m[92m[OKAY][0m.

 transformer[93m[NO][0mtransformer   stochastic_transformer............ ................... .  [93m[NO][0m [93m[NO][0m[93m[NO][0m [92m[OKAY][0m  .............. 
 .......[92m[OKAY][0m[92m[OKAY][0m
 
[92m[OKAY][0m
stochastic_transformer . [93m[NO][0mstochastic_transformer  ........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
ninjaninjaninjaninja  .................. ..................  .................. .................. [92m[OKAY][0m[92m[OKAY][0m
 [92m[OKAY][0m
[92m[OKAY][0m
--------------------------------------------------

------------------------------------------------------------------------------------------------------------------------------------------------------
op name

 op name................op name  op name................ installed  ................installed ..................    installed..compatible installed 
.. --------------------------------------------------compatible ..

compatible --------------------------------------------------compatible


----------------------------------------------------------------------------------------------------

cpu_adam ............... cpu_adam[92m[YES][0m  ..................... cpu_adamcpu_adam [92m[YES][0m   [92m[OKAY][0m....................................
   [92m[OKAY][0m[92m[YES][0m[92m[YES][0m
  ............  [92m[OKAY][0m[92m[OKAY][0m

fused_adam ............. [93m[NO][0m .......fused_adam  [92m[OKAY][0m.............
 fused_adam[93m[NO][0mfused_adam fused_lamb  ............. ....... ..........................  [93m[NO][0m [92m[OKAY][0m [93m[NO][0m[93m[NO][0m....... 
.......   .......[92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m
fused_lamb
 ............. fused_lamb[93m[NO][0mfused_lamb   .................................   [93m[NO][0m[92m[OKAY][0m[93m[NO][0m
  ..............sparse_attn   [92m[OKAY][0m............[92m[OKAY][0m 
[93m[NO][0m
 ....... sparse_attn[92m[OKAY][0m 
............ [93m[NO][0mtransformer  ...................  [92m[OKAY][0m[93m[NO][0m
sparse_attn  sparse_attn.......transformer............   ............[92m[OKAY][0m ............ 
 [93m[NO][0m[93m[NO][0m[93m[NO][0m   stochastic_transformer ..................... . [92m[OKAY][0m  [92m[OKAY][0m[93m[NO][0m
[92m[OKAY][0m
 transformer.......
  transformer............[92m[OKAY][0mstochastic_transformer   
.............[93m[NO][0m   [93m[NO][0m[93m[NO][0m.......  ....... ....... [92m[OKAY][0m [92m[OKAY][0m

[92m[OKAY][0m
stochastic_transformerstochastic_transformer  ..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

--------------------------------------------------
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------

op name
op nameop name op name   ................................................   ................installedinstalled installed installed ..  ....  .. compatiblecompatiblecompatible 


compatible----------------------------------------------------------------------------------------------------


--------------------------------------------------
cpu_adamcpu_adam  --------------------------------------------------cpu_adam...............
...............   [92m[YES][0m...............[92m[YES][0m   ......[92m[YES][0m ......[92m[OKAY][0m  
[92m[OKAY][0m......cpu_adam
  [92m[OKAY][0m...............
fused_adam .............  [93m[NO][0m[92m[YES][0mfused_adam  ....... ......fused_adam ............. [92m[OKAY][0m .............
 [93m[NO][0m [92m[OKAY][0m [93m[NO][0m fused_lamb
..............   .............[92m[OKAY][0m[92m[OKAY][0m 

[93m[NO][0m ....... [92m[OKAY][0mfused_lambfused_lamb
 fused_adam .............  .......................... [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m [92m[OKAY][0m
[93m[NO][0m
 sparse_attn.......  ............[92m[OKAY][0m 
[93m[NO][0msparse_attn sparse_attn .......  ............[92m[OKAY][0m fused_lamb............
[93m[NO][0m  [93m[NO][0m transformer ....... ................................    [92m[OKAY][0m[93m[NO][0m[93m[NO][0m[92m[OKAY][0m
 .......
transformer  [92m[OKAY][0m............transformer
   .......[93m[NO][0m............  stochastic_transformer ....... [93m[NO][0m .[92m[OKAY][0m [92m[OKAY][0m 
.......
[93m[NO][0m  [92m[OKAY][0m.......stochastic_transformer  
[92m[OKAY][0m.
 [93m[NO][0m stochastic_transformer.......  sparse_attn.[92m[OKAY][0m  
[93m[NO][0m ...................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------DeepSpeed C++/CUDA extension op report

JIT compiled ops requires ninja----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

DeepSpeed C++/CUDA extension op report--------------------------------------------------

JIT compiled ops requires ninja--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------DeepSpeed C++/CUDA extension op report

JIT compiled ops requires ninja--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


----------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------
op nameop name
  op name................................op name    installed................installed................  ..  ..installed installedcompatible   compatible..
 --------------------------------------------------
..compatible
--------------------------------------------------
 
compatible--------------------------------------------------

--------------------------------------------------
cpu_adamcpu_adam  ..............................  [92m[YES][0mcpu_adam cpu_adam......[92m[YES][0m    ...............[92m[OKAY][0m..................... 
  [92m[YES][0m[92m[OKAY][0m[92m[YES][0m 
 ............  [92m[OKAY][0m[92m[OKAY][0m

fused_adam ............. [93m[NO][0mfused_adam  ....................  [92m[OKAY][0m[93m[NO][0m
 fused_adam.......fused_adamfused_lamb  .............  [92m[OKAY][0m.............  .............
[93m[NO][0m[93m[NO][0m   fused_lamb[93m[NO][0m .............. .............  .......[92m[OKAY][0m [92m[OKAY][0m 

[92m[OKAY][0m[93m[NO][0m 
fused_lamb....... ............. fused_lamb [92m[OKAY][0m[93m[NO][0m
  ....................sparse_attn   [93m[NO][0m[92m[OKAY][0m............  
[93m[NO][0m.......sparse_attn  .......  ............[92m[OKAY][0m[92m[OKAY][0m 

[93m[NO][0m ....... transformersparse_attn[92m[OKAY][0m  
........................ transformer[93m[NO][0m   [93m[NO][0msparse_attn............  ....... .......[93m[NO][0m ............  [92m[OKAY][0m [92m[OKAY][0m
.......[93m[NO][0m
  [92m[OKAY][0mstochastic_transformer.......transformer   
.[92m[OKAY][0m............ 
 stochastic_transformer[93m[NO][0m[93m[NO][0m   .......transformer ........  [92m[OKAY][0m 
[93m[NO][0m[92m[OKAY][0m............ 
 .......[93m[NO][0m [92m[OKAY][0m stochastic_transformer
 ........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0mstochastic_transformer
 . [93m[NO][0m ....... [92m[OKAY][0m
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja
--------------------------------------------------

DeepSpeed C++/CUDA extension op report
JIT compiled ops requires ninja
DeepSpeed C++/CUDA extension op report
--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------

DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

JIT compiled ops requires ninja
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
JIT compiled ops requires ninja

JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninjaninja   .................. .................. .................................... [92m[OKAY][0m  [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------

------------------------------------------------------------------------------------------------------------------------------------------------------op name


 op nameop name................ op name  ................  ................installed................  ..installed  installed installed compatible..  ..
..compatible-------------------------------------------------- 

 compatible--------------------------------------------------compatible


--------------------------------------------------
--------------------------------------------------
cpu_adam ............... cpu_adam[92m[YES][0m  .....................cpu_adamcpu_adam   [92m[OKAY][0m ...............
[92m[YES][0m...............   [92m[YES][0m......[92m[YES][0m   ......[92m[OKAY][0mfused_adam...... 
 [92m[OKAY][0m .............
 [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0mfused_adam
 .............fused_adam  [93m[NO][0mfused_lamb fused_adam....................    ..........................[92m[OKAY][0m[93m[NO][0m  [93m[NO][0m
 [93m[NO][0m ....... ....... fused_lamb....... [92m[OKAY][0m  [92m[OKAY][0m[92m[OKAY][0m
.............

 fused_lamb[93m[NO][0m  .............fused_lamb.......   .............[93m[NO][0m[92m[OKAY][0m  
sparse_attn[93m[NO][0m.......   ............[92m[OKAY][0m....... 
 [93m[NO][0m [92m[OKAY][0m.......
 sparse_attn[92m[OKAY][0m 
............ [93m[NO][0m .......transformer  [92m[OKAY][0msparse_attn............
 sparse_attn ............ transformer[93m[NO][0m............    ...................[93m[NO][0m[93m[NO][0m    [93m[NO][0m.......[92m[OKAY][0m .......
 ....... [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0mstochastic_transformer
 

. [93m[NO][0mtransformertransformerstochastic_transformer   ........   ............[93m[NO][0m............[92m[OKAY][0m   
.......[93m[NO][0m[93m[NO][0m   [92m[OKAY][0m..............
  [92m[OKAY][0m[92m[OKAY][0m

stochastic_transformerstochastic_transformer  ..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

ninjaninjaninjaninja   ......................................................    [92m[OKAY][0m..................[92m[OKAY][0m[92m[OKAY][0m
 

[92m[OKAY][0m----------------------------------------------------------------------------------------------------
--------------------------------------------------


--------------------------------------------------op nameop nameop name
   ................op name................................   installed installed ................ ..installed  ..compatible installed 
.. compatible-------------------------------------------------- 

..compatible-------------------------------------------------- 

compatible--------------------------------------------------

--------------------------------------------------cpu_adam
 ...............cpu_adam  [92m[YES][0m...............  ......[92m[YES][0mcpu_adam  cpu_adam [92m[OKAY][0m...... 
 ..............................[92m[OKAY][0m  
[92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0mfused_adam

 fused_adam.............  .............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0mfused_adamfused_adam
 fused_lamb ............. .......................... fused_lamb  [93m[NO][0m [93m[NO][0m[93m[NO][0m .............  ....... ..............  [93m[NO][0m[92m[OKAY][0m [92m[OKAY][0m 
[92m[OKAY][0m
.......
 [92m[OKAY][0mfused_lamb
fused_lamb  ..........................  [93m[NO][0m[93m[NO][0m  .............. sparse_attn [92m[OKAY][0m [92m[OKAY][0m
............
 sparse_attn[93m[NO][0m  ...................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0mtransformer
 ............sparse_attn sparse_attntransformer[93m[NO][0m    .................................... .......  [93m[NO][0m[93m[NO][0m   [92m[OKAY][0m[93m[NO][0m.......
.......   [92m[OKAY][0m....... [92m[OKAY][0m
stochastic_transformer
[92m[OKAY][0m 
.stochastic_transformer transformer [93m[NO][0mtransformer .  ............ ...................  [93m[NO][0m [92m[OKAY][0m[93m[NO][0m 
[93m[NO][0m ....... ....... ....... [92m[OKAY][0m 
[92m[OKAY][0m[92m[OKAY][0m

stochastic_transformerstochastic_transformer  ..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


--------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninja
JIT compiled ops requires ninja
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
JIT compiled ops requires ninja
------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------


JIT compiled ops requires ninjaJIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja


JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninjaninja   .................................... ..................  [92m[OKAY][0m.................. [92m[OKAY][0m 

[92m[OKAY][0m[92m[OKAY][0m----------------------------------------------------------------------------------------------------


----------------------------------------------------------------------------------------------------op name

op name op nameop name ................  ................ ................................installed   installed installedinstalled ....    compatible....compatible
  --------------------------------------------------compatible
compatible

--------------------------------------------------
--------------------------------------------------

--------------------------------------------------
cpu_adam ............... cpu_adam[92m[YES][0m  cpu_adamcpu_adam...... ..............................  [92m[OKAY][0m  ...............
[92m[YES][0m[92m[YES][0m   [92m[YES][0m...... ...... ...... [92m[OKAY][0m [92m[OKAY][0m

[92m[OKAY][0mfused_adam
 ............. [93m[NO][0m .......fused_adam fused_adam[92m[OKAY][0m fused_adam .............
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


.............   [93m[NO][0m[93m[NO][0mfused_lamb ....................    ....................[92m[OKAY][0m [93m[NO][0m 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


[93m[NO][0m[92m[OKAY][0m  
..............fused_lamb   [92m[OKAY][0m[92m[OKAY][0m.............

op nameop name op nameop name ................   ................................ ................installed installed   installedinstalled....    ..compatiblecompatible.. 

fused_lamb  [93m[NO][0mfused_lamb.............   ....................[93m[NO][0m   [92m[OKAY][0m[93m[NO][0m.......
 compatible----------------------------------------------------------------------------------------------------compatible


----------------------------------------------------------------------------------------------------

  [92m[OKAY][0msparse_attn.......
  ............[92m[OKAY][0m [93m[NO][0m
cpu_adamcpu_adam  ...............cpu_adam ...............cpu_adam [92m[YES][0m   ...............[92m[YES][0m............... ......  ...... [92m[YES][0m [92m[YES][0m[92m[OKAY][0m [92m[OKAY][0m 
 .......sparse_attn  [92m[OKAY][0m............
......
......  [92m[OKAY][0m[92m[OKAY][0m

 sparse_attn[93m[NO][0m transformer............   ...................[93m[NO][0msparse_attn    ...................[93m[NO][0m[92m[OKAY][0m   
[92m[OKAY][0m.......[93m[NO][0m
  [92m[OKAY][0m.......transformer transformer
[92m[OKAY][0m  
fused_adam fused_adam.............  .............fused_adam[93m[NO][0mfused_adam    [93m[NO][0m.................................   .......  [92m[OKAY][0m[93m[NO][0m
[93m[NO][0m[92m[OKAY][0m  
fused_lamb..............   fused_lamb.............[92m[OKAY][0m [92m[OKAY][0m 
.............[93m[NO][0m
........................stochastic_transformer transformer  [93m[NO][0m [93m[NO][0m............  . ....... .......[93m[NO][0m [93m[NO][0m [92m[OKAY][0m ....... [92m[OKAY][0m
 
.......[92m[OKAY][0m 
  [93m[NO][0m.......fused_lamb  fused_lamb[92m[OKAY][0m .......
 ............. ............. [92m[OKAY][0m [93m[NO][0m
[92m[OKAY][0mstochastic_transformerstochastic_transformer 
[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

 .. [93m[NO][0m  stochastic_transformer.......[93m[NO][0m   .[92m[OKAY][0m....... 
 [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m .......sparse_attn  [92m[OKAY][0m............
 [93m[NO][0m transformer....... sparse_attnsparse_attn  ............ [92m[OKAY][0m ............[93m[NO][0m
 ............ [93m[NO][0m ....... [93m[NO][0m transformer....... [92m[OKAY][0m ....... 
 ............[92m[OKAY][0m[92m[OKAY][0m 

[93m[NO][0mstochastic_transformertransformer  transformer........    ........................[93m[NO][0m[92m[OKAY][0m   [93m[NO][0m
[93m[NO][0m.......   .......stochastic_transformer.......[92m[OKAY][0m  
 [92m[OKAY][0m.[92m[OKAY][0m
 
[93m[NO][0m stochastic_transformer.......stochastic_transformer  . [92m[OKAY][0m .
[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja
JIT compiled ops requires ninja


--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m
--------------------------------------------------


------------------------------------------------------------------------------------------------------------------------------------------------------

op name
op name op nameop name ................ ................  ................ installedinstalled ................  installed ....  installed ..compatiblecompatible  

..compatible---------------------------------------------------------------------------------------------------- 


compatible--------------------------------------------------

--------------------------------------------------
cpu_adamcpu_adam  ..............................  cpu_adam[92m[YES][0mcpu_adam[92m[YES][0m    ..........................................   [92m[OKAY][0m[92m[YES][0m 
[92m[OKAY][0m 
[92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
fused_adamfused_adam  ..........................  fused_adamfused_adam[93m[NO][0m[93m[NO][0m    .................... ....................[93m[NO][0m    [92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m
.......
  .......[92m[OKAY][0mfused_lamb 
 fused_lamb.............[92m[OKAY][0m  
.............[93m[NO][0mfused_lamb fused_lamb  [93m[NO][0m ....... .................................    [92m[OKAY][0m[93m[NO][0m[93m[NO][0m[92m[OKAY][0m
  
..............  [92m[OKAY][0m[92m[OKAY][0m

sparse_attn ............sparse_attn  sparse_attn............[93m[NO][0m   [93m[NO][0m...................   .......[92m[OKAY][0m
[93m[NO][0m sparse_attn [92m[OKAY][0mtransformer.......
  ............ ............ [92m[OKAY][0mtransformer [93m[NO][0m 
 [93m[NO][0m...................  transformer .......[93m[NO][0m [92m[OKAY][0m  
...................[92m[OKAY][0m  [92m[OKAY][0m
stochastic_transformer[93m[NO][0m 
 .transformer....... stochastic_transformer [93m[NO][0m  ............ [92m[OKAY][0m........ 
  [93m[NO][0m[92m[OKAY][0m[93m[NO][0m 
 .......stochastic_transformer.......   [92m[OKAY][0m.[92m[OKAY][0m
 
[93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


op nameop nameop name  op name................  ................................................    installedinstalledinstalledinstalled    ........   compatible 
compatiblecompatiblecompatible
--------------------------------------------------

--------------------------------------------------
----------------------------------------------------------------------------------------------------


cpu_adamcpu_adamcpu_adam  cpu_adam...............  ............... ............... ............... [92m[YES][0m[92m[YES][0m[92m[YES][0m    ............[92m[YES][0m ...... [92m[OKAY][0m  [92m[OKAY][0m
......[92m[OKAY][0m
 
[92m[OKAY][0m
fused_adam fused_adam.............  .............fused_adam  [93m[NO][0m[93m[NO][0m............. fused_adam[93m[NO][0m    .................... ..............[92m[OKAY][0m  
 [93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m 

fused_lamb.......  .............[92m[OKAY][0m 
fused_lamb[93m[NO][0mfused_lamb   .......................... fused_lamb.......  [93m[NO][0m .............[93m[NO][0m[92m[OKAY][0m  
....... ....... [93m[NO][0m[92m[OKAY][0m  
[92m[OKAY][0m.......
 [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............transformer sparse_attnsparse_attn [93m[NO][0m ............  ............ ............ [93m[NO][0m....... [93m[NO][0m [93m[NO][0m ....... .......   .......[92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m


transformer ............stochastic_transformertransformer  transformer  [93m[NO][0m. ........................ .......   [93m[NO][0m[93m[NO][0m[92m[OKAY][0m[93m[NO][0m 
  .....................   [92m[OKAY][0mstochastic_transformer[92m[OKAY][0m
 [92m[OKAY][0m
.
 stochastic_transformer[93m[NO][0m stochastic_transformer . .......  .[93m[NO][0m[92m[OKAY][0m  
.......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0mutils  .........................  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
utils quantizer..................  ..............[92m[YES][0m  [93m[NO][0m......  .......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer-------------------------------------------------- 
.............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0masync_io
 ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m .......utils  [92m[OKAY][0m
.................. [92m[YES][0m ...... [92m[OKAY][0m
utils .................. quantizer[92m[YES][0m  ....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
quantizer .............. --------------------------------------------------[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ...............async_io [93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io async_io...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m .......utils  [92m[OKAY][0m..................
 [92m[YES][0m ...... [92m[OKAY][0m
utils .................. [92m[YES][0mquantizer  ....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0mquantizer
--------------------------------------------------
 .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0masync_io .......  ...............[93m[NO][0m 
[93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m transformer_inference.......  ..[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0mutils  ........................  [92m[OKAY][0m[92m[YES][0m
 ......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m quantizer.......  ..............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
--------------------------------------------------
transformer_inference .. [93m[NO][0m .......transformer_inference  .. [93m[NO][0m ....... [92m[OKAY][0m
[92m[OKAY][0m
utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. --------------------------------------------------[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0masync_io .......  [93m[NO][0m...............
 [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference ..utils  ..................[93m[NO][0m  [92m[YES][0m .............  [92m[OKAY][0m[92m[OKAY][0m

quantizer .............. [93m[NO][0m .......utils  [92m[OKAY][0m
.................. [92m[YES][0m --------------------------------------------------......
 [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ...............async_io  [93m[NO][0m...............  .......[93m[NO][0m  [93m[NO][0m.......
 [93m[NO][0m
transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer ..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
----------------------------------------------------------------------------------------------------

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... async_io[93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
transformer_inference .. transformer_inference[93m[NO][0m  .. .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ...............async_io [93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
transformer_inference .. transformer_inference[93m[NO][0m  .........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer .............. [93m[NO][0m quantizer.......  ..............[92m[OKAY][0m 
[93m[NO][0m .......-------------------------------------------------- 
[92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m
[92m[OKAY][0m
utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer quantizer..............  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... async_io[93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m 
DeepSpeed general environment info:DeepSpeed general environment info:
....... [93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

DeepSpeed general environment info:
torch version torch version....................  ....................1.8.1 
1.8.1torch cuda version
 ............... torch cuda version11.1 
...............nvcc version  11.1.....................
 nvcc version11.2 
.....................deepspeed install path  11.2...........
quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
 deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']...........
 deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ...................
 deepspeed info0.4.2+72ce55a, 72ce55a, big-science 
----------------------------------------------------------------------------------------------------

torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
...................deepspeed wheel compiled w.  0.4.2+72ce55a, 72ce55a, big-science......
 deepspeed wheel compiled w.torch 1.8, cuda 11.1 
...... torch 1.8, cuda 11.1
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.2
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inferenceasync_io ..  ...............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[93m[NO][0m
utils .................. [92m[YES][0m transformer_inference......  [92m[OKAY][0m..
 [93m[NO][0m ....... [92m[OKAY][0mquantizer
 .............. [93m[NO][0m ....... utils[92m[OKAY][0m 
.................. [92m[YES][0m ...... --------------------------------------------------[92m[OKAY][0m

quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... async_io[93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer .............. [93m[NO][0m quantizer.......  ..............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']....................
 1.8.1
torch version torch cuda version....................  ...............1.8.1 
11.1
nvcc versiontorch cuda version  ....................................  11.211.1

deepspeed install pathnvcc version  ................................  11.2['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed install path deepspeed info...........  ................... 0.4.2+72ce55a, 72ce55a, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed wheel compiled w.deepspeed info  .........................  torch 1.8, cuda 11.10.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
------------------------------------------------------------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report
--------------------------------------------------DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaJIT compiled ops requires ninjaJIT compiled ops requires ninja


--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']DeepSpeed general environment info:
torch version
 .................... 1.8.1
torch install path torch cuda version...............  ............... 11.1
nvcc version .....................['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 11.2

deepspeed install path torch version...........  .................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']1.8.1

deepspeed info torch cuda version...................  ...............0.4.2+72ce55a, 72ce55a, big-science 
11.1deepspeed wheel compiled w.
 nvcc version......  .....................torch 1.8, cuda 11.1 
11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------

op name
op name op name................op name    ................................installed   ................installedinstalled..    installedcompatible....
   ..--------------------------------------------------compatible 
compatiblecompatible
--------------------------------------------------


----------------------------------------------------------------------------------------------------

cpu_adam ...............cpu_adam  [92m[YES][0m............... cpu_adam ......cpu_adam  [92m[YES][0m [92m[OKAY][0m...............
 ............... ...... [92m[YES][0m [92m[YES][0m [92m[OKAY][0m fused_adam............
   [92m[OKAY][0m[92m[OKAY][0m.............

 [93m[NO][0m ....... fused_adam[92m[OKAY][0m 
............. [93m[NO][0m fused_adam.......fused_lamb  [92m[OKAY][0m.............fused_adam 
  .............[93m[NO][0m.............fused_lamb  [93m[NO][0m  [93m[NO][0m ....................  ....... .......[93m[NO][0m  [92m[OKAY][0m [92m[OKAY][0m.......[92m[OKAY][0m

 
[92m[OKAY][0m
fused_lambfused_lamb  ..........................  [93m[NO][0m[93m[NO][0m  ..............  sparse_attn[92m[OKAY][0m [92m[OKAY][0m
............sparse_attn
  [93m[NO][0m............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
transformer sparse_attntransformer............   sparse_attn........................  [93m[NO][0m[93m[NO][0m   [93m[NO][0m............ ..............  ....... [93m[NO][0m[92m[OKAY][0m  [92m[OKAY][0m.......

[92m[OKAY][0m 
stochastic_transformerstochastic_transformer[92m[OKAY][0mtransformer  
 ..............  transformer[93m[NO][0m [93m[NO][0m   [93m[NO][0m..........................  [92m[OKAY][0m ....... [92m[OKAY][0m
 
[93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
stochastic_transformer .stochastic_transformer  [93m[NO][0m.  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... async_io[93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m
 ....... [93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils .................. utils[92m[YES][0m  ........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
----------------------------------------------------------------------------------------------------

DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2
deepspeed install path deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ...................0.4.2+72ce55a, 72ce55a, big-science 
0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path torch install path...............  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch version torch version....................  ....................1.8.1 
1.8.1
torch cuda version torch cuda version...............  ...............11.1 
11.1
nvcc version nvcc version.....................  .....................11.2 
11.2
deepspeed install path deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 
.................... 1.8.1torch version
 ....................torch cuda version  1.8.1...............
 11.1
torch cuda versionnvcc version  ....................................  11.111.2

nvcc versiondeepspeed install path  ................................  11.2
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed install path
 deepspeed info...........  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']torch version
 .................... torch version1.8.1 
.................... torch cuda version1.8.1 
...............torch cuda version  11.1...............
 nvcc version11.1 
.....................nvcc version  11.2.....................
 deepspeed install path11.2 
...........deepspeed install path  ...........['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
...................deepspeed info  0.4.2+72ce55a, 72ce55a, big-science...................
 deepspeed wheel compiled w.0.4.2+72ce55a, 72ce55a, big-science 
......deepspeed wheel compiled w.  torch 1.8, cuda 11.1......
 torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version ....................['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 
1.8.1
torch versiontorch cuda version  ...................................  1.8.111.1

torch cuda versionnvcc version  ....................................  11.111.2

nvcc versiondeepspeed install path  ................................  11.2
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed install path
 deepspeed info...........  ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']0.4.2+72ce55a, 72ce55a, big-science

deepspeed infodeepspeed wheel compiled w.  .........................  0.4.2+72ce55a, 72ce55a, big-sciencetorch 1.8, cuda 11.1

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install path deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version torch version....................  ....................1.8.1 
1.8.1
torch cuda version ...............torch cuda version  11.1...............
 nvcc version11.1 
.....................nvcc version  11.2.....................
 deepspeed install path11.2 
...........deepspeed install path  ...........['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ...................0.4.2+72ce55a, 72ce55a, big-science 
0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch version torch version....................  ....................1.8.1 
1.8.1
torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ...................0.4.2+72ce55a, 72ce55a, big-science 
0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
DeepSpeed general environment info:
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install pathDeepSpeed general environment info: ........... 
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ...................torch install path 0.4.2+72ce55a, 72ce55a, big-science 
...............deepspeed wheel compiled w.  ...... torch 1.8, cuda 11.1
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
nvcc version ..................... 11.2
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
DeepSpeed general environment info:torch version
 .................... torch install path1.8.1 
............... torch cuda version ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']11.1

nvcc versiontorch version  .........................................  11.21.8.1

deepspeed install pathtorch cuda version  ..........................  11.1
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']nvcc version 
..................... deepspeed info11.2
 deepspeed install path...................  ........... 0.4.2+72ce55a, 72ce55a, big-science
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed wheel compiled w.deepspeed info  .........................  0.4.2+72ce55a, 72ce55a, big-science
torch 1.8, cuda 11.1deepspeed wheel compiled w.
 ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']torch version
 .................... 1.8.1torch version
 .................... torch cuda version1.8.1 
............... 11.1torch cuda version
 nvcc version...............  .....................11.1 
11.2nvcc version
 deepspeed install path.....................  ...........11.2 
deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
...........deepspeed info  ...................['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
0.4.2+72ce55a, 72ce55a, big-science
deepspeed info deepspeed wheel compiled w....................  ......0.4.2+72ce55a, 72ce55a, big-science 
torch 1.8, cuda 11.1
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version torch version....................  ....................1.8.1 
1.8.1
torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-scienceDeepSpeed general environment info:
deepspeed wheel compiled w. 
...... torch 1.8, cuda 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path torch install path...............  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch version torch version....................  ....................1.8.1 
1.8.1torch cuda version
 ...............torch cuda version  11.1...............
 nvcc version11.1 
.....................nvcc version  11.2.....................
 deepspeed install path11.2 
...........deepspeed install path  ...........['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
...................deepspeed info  0.4.2+72ce55a, 72ce55a, big-science...................
 deepspeed wheel compiled w.0.4.2+72ce55a, 72ce55a, big-science 
......deepspeed wheel compiled w.  torch 1.8, cuda 11.1......
 torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

using world size: 64, data-parallel-size: 4, tensor-model-parallel size: 4, pipeline-model-parallel size: 4 
using torch.float16 for parameters ...
------------------------ arguments ------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
  accumulate_allreduce_grads_in_fp32 .............. False
  adam_beta1 ...................................... 0.9
  adam_beta2 ...................................... 0.999
  adam_eps ........................................ 1e-08
  adlr_autoresume ................................. False
  adlr_autoresume_interval ........................ 1000
  apply_query_key_layer_scaling ................... True
  apply_residual_connection_post_layernorm ........ False
  attention_dropout ............................... 0.1
  attention_softmax_in_fp32 ....................... False
  bert_binary_head ................................ True
  bert_load ....................................... None
  bf16 ............................................ False
  bias_dropout_fusion ............................. True
  bias_gelu_fusion ................................ True
  biencoder_projection_dim ........................ 0
  biencoder_shared_query_context_model ............ False
  block_data_path ................................. None
  checkpoint_activations .......................... True
  checkpoint_in_cpu ............................... False
  checkpoint_num_layers ........................... 1
  clip_grad ....................................... 1.0
  codecarbon_dir .................................. None
  consumed_train_samples .......................... 0
  consumed_valid_samples .......................... 0
  contigious_checkpointing ........................ False
  cpu_optimizer ................................... False
  cpu_torch_adam .................................. False
  data_impl ....................................... mmap
  data_parallel_size .............................. 4
  data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document']
  dataloader_type ................................. single
  DDP_impl ........................................ local
  decoder_seq_length .............................. None
  deepscale ....................................... False
  deepscale_config ................................ None
  deepspeed ....................................... True
  deepspeed_activation_checkpointing .............. True
  deepspeed_config ................................ ./ds_config.1291626.json
  deepspeed_mpi ................................... False
  distribute_checkpointed_activations ............. False
  distributed_backend ............................. nccl
  embedding_path .................................. None
  encoder_seq_length .............................. 2048
  eod_mask_loss ................................... False
  eval_interval ................................... 1000
  eval_iters ...................................... 100
  evidence_data_path .............................. None
  exit_duration_in_mins ........................... 1190
  exit_interval ................................... None
  ffn_hidden_size ................................. 8192
  finetune ........................................ False
  fp16 ............................................ True
  fp16_lm_cross_entropy ........................... False
  fp32_residual_connection ........................ False
  gigaflos_no_embeds .............................. 0
  global_batch_size ............................... 512
  glu_activation .................................. None
  hidden_dropout .................................. 0.1
  hidden_size ..................................... 2048
  hysteresis ...................................... 2
  ict_head_size ................................... None
  ict_load ........................................ None
  img_dim ......................................... 224
  indexer_batch_size .............................. 128
  indexer_log_interval ............................ 1000
  init_method_std ................................. 0.02
  init_method_xavier_uniform ...................... False
  initial_loss_scale .............................. 4294967296
  kv_channels ..................................... 128
  layernorm_epsilon ............................... 1e-05
  lazy_mpu_init ................................... None
  load ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
  local_rank ...................................... 0
  log_batch_size_to_tensorboard ................... True
  log_interval .................................... 200
  log_learning_rate_to_tensorboard ................ True
  log_loss_scale_to_tensorboard ................... True
  log_num_zeros_in_grad ........................... False
  log_params_norm ................................. False
  log_timers_to_tensorboard ....................... True
  log_validation_ppl_to_tensorboard ............... True
  loss_on_targets_only ............................ False
  loss_scale ...................................... 12.0
  loss_scale_window ............................... 1000
  lr .............................................. 0.0002
  lr_decay_iters .................................. None
  lr_decay_samples ................................ 73242187
  lr_decay_style .................................. cosine
  lr_warmup_fraction .............................. None
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

  lr_warmup_iters ................................. 0
  lr_warmup_samples ............................... 183105
  make_vocab_size_divisible_by .................... 128
  mask_prob ....................................... 0.15
  masked_softmax_fusion ........................... True
  max_position_embeddings ......................... None
  memory_centric_tiled_linear ..................... False
  merge_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-merges.txt
  micro_batch_size ................................ 8
  min_loss_scale .................................. 1.0
  min_lr .......................................... 1e-05
  mmap_warmup ..................................... False
  no_load_optim ................................... None
  no_load_rng ..................................... None
  no_save_optim ................................... None
  no_save_rng ..................................... None
  num_attention_heads ............................. 16
  num_channels .................................... 3
  num_classes ..................................... 1000
  num_layers ...................................... 24
  num_layers_per_virtual_pipeline_stage ........... None
  num_workers ..................................... 2
  onnx_safe ....................................... None
  openai_gelu ..................................... False
  optimizer ....................................... adam
  override_lr_scheduler ........................... False
  params_dtype .................................... torch.float16
  partition_activations ........................... False
  patch_dim ....................................... 16
  pipeline_model_parallel_size .................... 4
  position_embedding_type ......................... PositionEmbeddingType.rotary
  profile_backward ................................ False
  query_in_block_prob ............................. 0.1
  rampup_batch_size ............................... ['32', '32', '2_000_000']
  rank ............................................ 0
  remote_device ................................... none
  reset_attention_mask ............................ False
  reset_position_ids .............................. False
  retriever_report_topk_accuracies ................ []
  retriever_score_scaling ......................... False
  retriever_seq_length ............................ 256
  sample_rate ..................................... 1.0
  save ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
  save_interval ................................... 1500
  scatter_gather_tensors_in_pipeline .............. True
  scattered_embeddings ............................ False
  seed ............................................ 1234
  seq_length ...................................... 2048
  sgd_momentum .................................... 0.9
  short_seq_prob .................................. 0.1
  split ........................................... 949,50,1
  split_transformers .............................. False
  synchronize_each_layer .......................... False
  tensor_model_parallel_size ...................... 4
  tensorboard_dir ................................. /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/tr4c-1B3-rotary-oscar-logs/tensorboard
  tensorboard_log_interval ........................ 1
  tensorboard_queue_size .......................... 5
  tile_factor ..................................... 1
  titles_data_path ................................ None
  tokenizer_name_or_path .......................... None
  tokenizer_type .................................. GPT2BPETokenizer
  train_iters ..................................... None
  train_samples ................................... 73242187
  use_checkpoint_lr_scheduler ..................... False
  use_contiguous_buffers_in_ddp ................... False
  use_cpu_initialization .......................... None
  use_one_sent_docs ............................... False
  use_pin_memory .................................. False
  virtual_pipeline_model_parallel_size ............ None
  vocab_extra_ids ................................. 0
  vocab_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-vocab.json
  weight_decay .................................... 0.1
  world_size ...................................... 64
  zero_allgather_bucket_size ...................... 0.0
  zero_contigious_gradients ....................... False
  zero_reduce_bucket_size ......................... 0.0
  zero_reduce_scatter ............................. False
  zero_stage ...................................... 1
-------------------- end of arguments ---------------------
will use batch size rampup starting from global batch size 32 to global batch size 512 with batch size increments 32 over 2000000 samples.
> building GPT2BPETokenizer tokenizer ...
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
 > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688)
> initializing torch distributed ...
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
> setting tensorboard ...
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
> initializing tensor model parallel with size 4
> initializing pipeline model parallel with size 4
> setting random seeds to 1234 ...
[2021-09-29 10:38:37,621] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
> compiling dataset index builder ...
make: Entering directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data'
make: Nothing to be done for 'default'.
make: Leaving directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data'
>>> done with dataset index builder. Compilation time: 0.322 seconds
> compiling and loading fused kernels ...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_upper_triang_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_upper_triang_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module fused_mix_prec_layer_norm_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_mix_prec_layer_norm_cuda...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
>>> done with compiling and loading fused kernels. Compilation time: 23.033 seconds
time to initialize megatron (seconds): 37.160
[after megatron is initialized] datetime: 2021-09-29 10:39:01 
building GPT model ...
[2021-09-29 10:39:01,251] [INFO] [utils.py:680:see_memory_usage] Before Building Model
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved
  warnings.warn(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved
  warnings.warn(
[2021-09-29 10:39:01,253] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB         Max_MA 0.0 GB         CA 0.0 GB         Max_CA 0 GB 
[2021-09-29 10:39:01,254] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory:  used = 39.52 GB, percent = 21.1%
SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None
Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=1, data=0, model=0): 16, ProcessCoord(pipe=1, data=0, model=1): 17, ProcessCoord(pipe=1, data=0, model=2): 18, ProcessCoord(pipe=1, data=0, model=3): 19, ProcessCoord(pipe=1, data=1, model=0): 20, ProcessCoord(pipe=1, data=1, model=1): 21, ProcessCoord(pipe=1, data=1, model=2): 22, ProcessCoord(pipe=1, data=1, model=3): 23, ProcessCoord(pipe=1, data=2, model=0): 24, ProcessCoord(pipe=1, data=2, model=1): 25, ProcessCoord(pipe=1, data=2, model=2): 26, ProcessCoord(pipe=1, data=2, model=3): 27, ProcessCoord(pipe=1, data=3, model=0): 28, ProcessCoord(pipe=1, data=3, model=1): 29, ProcessCoord(pipe=1, data=3, model=2): 30, ProcessCoord(pipe=1, data=3, model=3): 31, ProcessCoord(pipe=2, data=0, model=0): 32, ProcessCoord(pipe=2, data=0, model=1): 33, ProcessCoord(pipe=2, data=0, model=2): 34, ProcessCoord(pipe=2, data=0, model=3): 35, ProcessCoord(pipe=2, data=1, model=0): 36, ProcessCoord(pipe=2, data=1, model=1): 37, ProcessCoord(pipe=2, data=1, model=2): 38, ProcessCoord(pipe=2, data=1, model=3): 39, ProcessCoord(pipe=2, data=2, model=0): 40, ProcessCoord(pipe=2, data=2, model=1): 41, ProcessCoord(pipe=2, data=2, model=2): 42, ProcessCoord(pipe=2, data=2, model=3): 43, ProcessCoord(pipe=2, data=3, model=0): 44, ProcessCoord(pipe=2, data=3, model=1): 45, ProcessCoord(pipe=2, data=3, model=2): 46, ProcessCoord(pipe=2, data=3, model=3): 47, ProcessCoord(pipe=3, data=0, model=0): 48, ProcessCoord(pipe=3, data=0, model=1): 49, ProcessCoord(pipe=3, data=0, model=2): 50, ProcessCoord(pipe=3, data=0, model=3): 51, ProcessCoord(pipe=3, data=1, model=0): 52, ProcessCoord(pipe=3, data=1, model=1): 53, ProcessCoord(pipe=3, data=1, model=2): 54, ProcessCoord(pipe=3, data=1, model=3): 55, ProcessCoord(pipe=3, data=2, model=0): 56, ProcessCoord(pipe=3, data=2, model=1): 57, ProcessCoord(pipe=3, data=2, model=2): 58, ProcessCoord(pipe=3, data=2, model=3): 59, ProcessCoord(pipe=3, data=3, model=0): 60, ProcessCoord(pipe=3, data=3, model=1): 61, ProcessCoord(pipe=3, data=3, model=2): 62, ProcessCoord(pipe=3, data=3, model=3): 63}
[2021-09-29 10:39:01,777] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer
stage=0 layers=9
     0: _to_float16
     1: EmbeddingPipe
     2: <lambda>
     3: ParallelTransformerLayerPipe
     4: ParallelTransformerLayerPipe
     5: ParallelTransformerLayerPipe
     6: ParallelTransformerLayerPipe
     7: ParallelTransformerLayerPipe
     8: ParallelTransformerLayerPipe
stage=1 layers=6
     9: ParallelTransformerLayerPipe
    10: ParallelTransformerLayerPipe
    11: ParallelTransformerLayerPipe
    12: ParallelTransformerLayerPipe
    13: ParallelTransformerLayerPipe
    14: ParallelTransformerLayerPipe
stage=2 layers=6
    15: ParallelTransformerLayerPipe
    16: ParallelTransformerLayerPipe
    17: ParallelTransformerLayerPipe
    18: ParallelTransformerLayerPipe
    19: ParallelTransformerLayerPipe
    20: ParallelTransformerLayerPipe
stage=3 layers=10
    21: ParallelTransformerLayerPipe
    22: ParallelTransformerLayerPipe
    23: ParallelTransformerLayerPipe
    24: ParallelTransformerLayerPipe
    25: ParallelTransformerLayerPipe
    26: ParallelTransformerLayerPipe
    27: <lambda>
    28: MixedFusedLayerNorm
    29: EmbeddingPipe
    30: float16_to_fp32
  loss: CrossEntropy
 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 75592704

 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 101549056
 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 101549056
 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 101549056
 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 101549056
 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 101544960
 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 101544960
 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 101544960
[2021-09-29 10:39:02,150] [INFO] [utils.py:680:see_memory_usage] After Building Model
[2021-09-29 10:39:02,151] [INFO] [utils.py:681:see_memory_usage] MA 0.2 GB         Max_MA 0.22 GB         CA 0.24 GB         Max_CA 0 GB 
[2021-09-29 10:39:02,151] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory:  used = 39.9 GB, percent = 21.3%
 > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 101544960
setting training iterations to 152972
> learning rate decay style: cosine
DeepSpeed is enabled.
[2021-09-29 10:39:02,170] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+72ce55a, git-hash=72ce55a, git-branch=big-science
[2021-09-29 10:39:02,238] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False
[2021-09-29 10:39:02,238] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer
[2021-09-29 10:39:02,238] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer
[2021-09-29 10:39:02,238] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam
[2021-09-29 10:39:02,238] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type=<class 'apex.optimizers.fused_adam.FusedAdam'>
[2021-09-29 10:39:02,238] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer
[2021-09-29 10:39:02,238] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000
[2021-09-29 10:39:02,238] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000
[2021-09-29 10:39:02,238] [INFO] [stage2.py:108:__init__] CPU Offload: False
[2021-09-29 10:39:02,238] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False
[2021-09-29 10:39:02,484] [INFO] [stage2.py:419:__init__] optimizer state initialized
[2021-09-29 10:39:02,484] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam
[2021-09-29 10:39:02,484] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler
[2021-09-29 10:39:02,484] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = <megatron.learning_rates.AnnealingLR object at 0x152fc0d03970>
[2021-09-29 10:39:02,484] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)]
[2021-09-29 10:39:02,484] [INFO] [config.py:900:print] DeepSpeedEngine configuration:
[2021-09-29 10:39:02,485] [INFO] [config.py:904:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2021-09-29 10:39:02,485] [INFO] [config.py:904:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2021-09-29 10:39:02,485] [INFO] [config.py:904:print]   allreduce_always_fp32 ........ False
[2021-09-29 10:39:02,485] [INFO] [config.py:904:print]   amp_enabled .................. False
[2021-09-29 10:39:02,485] [INFO] [config.py:904:print]   amp_params ................... False
[2021-09-29 10:39:02,485] [INFO] [config.py:904:print]   checkpoint_tag_validation_enabled  True
[2021-09-29 10:39:02,485] [INFO] [config.py:904:print]   checkpoint_tag_validation_fail  False
[2021-09-29 10:39:02,485] [INFO] [config.py:904:print]   disable_allgather ............ False
[2021-09-29 10:39:02,485] [INFO] [config.py:904:print]   dump_state ................... False
[2021-09-29 10:39:02,485] [INFO] [config.py:904:print]   dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1}
[2021-09-29 10:39:02,485] [INFO] [config.py:904:print]   eigenvalue_enabled ........... False
[2021-09-29 10:39:02,485] [INFO] [config.py:904:print]   eigenvalue_gas_boundary_resolution  1
[2021-09-29 10:39:02,485] [INFO] [config.py:904:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2021-09-29 10:39:02,485] [INFO] [config.py:904:print]   eigenvalue_layer_num ......... 0
[2021-09-29 10:39:02,485] [INFO] [config.py:904:print]   eigenvalue_max_iter .......... 100
[2021-09-29 10:39:02,485] [INFO] [config.py:904:print]   eigenvalue_stability ......... 1e-06
[2021-09-29 10:39:02,485] [INFO] [config.py:904:print]   eigenvalue_tol ............... 0.01
[2021-09-29 10:39:02,485] [INFO] [config.py:904:print]   eigenvalue_verbose ........... False
[2021-09-29 10:39:02,485] [INFO] [config.py:904:print]   elasticity_enabled ........... False
[2021-09-29 10:39:02,485] [INFO] [config.py:904:print]   flops_profiler_config ........ {
    "enabled": false, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2021-09-29 10:39:02,485] [INFO] [config.py:904:print]   fp16_enabled ................. True
[2021-09-29 10:39:02,485] [INFO] [config.py:904:print]   fp16_mixed_quantize .......... False
[2021-09-29 10:39:02,485] [INFO] [config.py:904:print]   global_rank .................. 0
[2021-09-29 10:39:02,485] [INFO] [config.py:904:print]   gradient_accumulation_steps .. 16
[2021-09-29 10:39:02,485] [INFO] [config.py:904:print]   gradient_clipping ............ 1.0
[2021-09-29 10:39:02,485] [INFO] [config.py:904:print]   gradient_predivide_factor .... 1.0
[2021-09-29 10:39:02,485] [INFO] [config.py:904:print]   initial_dynamic_scale ........ 4096
[2021-09-29 10:39:02,485] [INFO] [config.py:904:print]   loss_scale ................... 0
[2021-09-29 10:39:02,485] [INFO] [config.py:904:print]   memory_breakdown ............. False
[2021-09-29 10:39:02,485] [INFO] [config.py:904:print]   optimizer_legacy_fusion ...... False
[2021-09-29 10:39:02,485] [INFO] [config.py:904:print]   optimizer_name ............... None
[2021-09-29 10:39:02,485] [INFO] [config.py:904:print]   optimizer_params ............. None
[2021-09-29 10:39:02,486] [INFO] [config.py:904:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2021-09-29 10:39:02,486] [INFO] [config.py:904:print]   pld_enabled .................. False
[2021-09-29 10:39:02,486] [INFO] [config.py:904:print]   pld_params ................... False
[2021-09-29 10:39:02,486] [INFO] [config.py:904:print]   prescale_gradients ........... False
[2021-09-29 10:39:02,486] [INFO] [config.py:904:print]   quantize_change_rate ......... 0.001
[2021-09-29 10:39:02,486] [INFO] [config.py:904:print]   quantize_groups .............. 1
[2021-09-29 10:39:02,486] [INFO] [config.py:904:print]   quantize_offset .............. 1000
[2021-09-29 10:39:02,486] [INFO] [config.py:904:print]   quantize_period .............. 1000
[2021-09-29 10:39:02,486] [INFO] [config.py:904:print]   quantize_rounding ............ 0
[2021-09-29 10:39:02,486] [INFO] [config.py:904:print]   quantize_start_bits .......... 16
[2021-09-29 10:39:02,486] [INFO] [config.py:904:print]   quantize_target_bits ......... 8
[2021-09-29 10:39:02,486] [INFO] [config.py:904:print]   quantize_training_enabled .... False
[2021-09-29 10:39:02,486] [INFO] [config.py:904:print]   quantize_type ................ 0
[2021-09-29 10:39:02,486] [INFO] [config.py:904:print]   quantize_verbose ............. False
[2021-09-29 10:39:02,486] [INFO] [config.py:904:print]   scheduler_name ............... None
[2021-09-29 10:39:02,486] [INFO] [config.py:904:print]   scheduler_params ............. None
[2021-09-29 10:39:02,486] [INFO] [config.py:904:print]   sparse_attention ............. None
[2021-09-29 10:39:02,486] [INFO] [config.py:904:print]   sparse_gradients_enabled ..... False
[2021-09-29 10:39:02,486] [INFO] [config.py:904:print]   steps_per_print .............. 2000
[2021-09-29 10:39:02,486] [INFO] [config.py:904:print]   tensorboard_enabled .......... False
[2021-09-29 10:39:02,486] [INFO] [config.py:904:print]   tensorboard_job_name ......... DeepSpeedJobName
[2021-09-29 10:39:02,486] [INFO] [config.py:904:print]   tensorboard_output_path ...... 
[2021-09-29 10:39:02,486] [INFO] [config.py:904:print]   train_batch_size ............. 512
[2021-09-29 10:39:02,486] [INFO] [config.py:904:print]   train_micro_batch_size_per_gpu  8
[2021-09-29 10:39:02,486] [INFO] [config.py:904:print]   use_quantizer_kernel ......... False
[2021-09-29 10:39:02,486] [INFO] [config.py:904:print]   wall_clock_breakdown ......... False
[2021-09-29 10:39:02,486] [INFO] [config.py:904:print]   world_size ................... 4
[2021-09-29 10:39:02,486] [INFO] [config.py:904:print]   zero_allow_untested_optimizer  False
[2021-09-29 10:39:02,486] [INFO] [config.py:904:print]   zero_config .................. {
    "stage": 1, 
    "contiguous_gradients": false, 
    "reduce_scatter": true, 
    "reduce_bucket_size": 5.000000e+08, 
    "allgather_partitions": true, 
    "allgather_bucket_size": 5.000000e+08, 
    "overlap_comm": false, 
    "load_from_fp32_weights": true, 
    "elastic_checkpoint": true, 
    "offload_param": null, 
    "offload_optimizer": null, 
    "sub_group_size": 1.000000e+09, 
    "prefetch_bucket_size": 5.000000e+07, 
    "param_persistence_threshold": 1.000000e+05, 
    "max_live_parameters": 1.000000e+09, 
    "max_reuse_distance": 1.000000e+09, 
    "gather_fp16_weights_on_model_save": false, 
    "ignore_unused_parameters": true, 
    "round_robin_gradients": false, 
    "legacy_stage1": false
}
[2021-09-29 10:39:02,486] [INFO] [config.py:904:print]   zero_enabled ................. True
[2021-09-29 10:39:02,486] [INFO] [config.py:904:print]   zero_optimization_stage ...... 1
[2021-09-29 10:39:02,486] [INFO] [config.py:906:print]   json = {
    "train_micro_batch_size_per_gpu": 8, 
    "train_batch_size": 512, 
    "gradient_clipping": 1.0, 
    "zero_optimization": {
        "stage": 1
    }, 
    "fp16": {
        "enabled": true, 
        "loss_scale": 0, 
        "loss_scale_window": 500, 
        "hysteresis": 2, 
        "min_loss_scale": 1, 
        "initial_scale_power": 12
    }, 
    "steps_per_print": 2.000000e+03, 
    "wall_clock_breakdown": false
}
[2021-09-29 10:39:02,487] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=16 micro_batch_size=8
[2021-09-29 10:39:02,776] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-29 10:39:02,776] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-29 10:39:02,776] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-29 10:39:02,776] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-29 10:39:02,776] [INFO] [engine.py:134:__init__] RANK=32 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-29 10:39:02,776] [INFO] [engine.py:134:__init__] RANK=35 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-29 10:39:02,776] [INFO] [engine.py:134:__init__] RANK=33 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-29 10:39:02,776] [INFO] [engine.py:134:__init__] RANK=48 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-29 10:39:02,776] [INFO] [engine.py:134:__init__] RANK=49 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-29 10:39:02,776] [INFO] [engine.py:134:__init__] RANK=50 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-29 10:39:02,776] [INFO] [engine.py:134:__init__] RANK=51 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-29 10:39:02,776] [INFO] [engine.py:134:__init__] RANK=17 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-29 10:39:02,776] [INFO] [engine.py:134:__init__] RANK=18 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-29 10:39:02,776] [INFO] [engine.py:134:__init__] RANK=34 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-29 10:39:02,776] [INFO] [engine.py:134:__init__] RANK=16 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-29 10:39:02,776] [INFO] [engine.py:134:__init__] RANK=19 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
 > using checkpoint value 0.0002 for learning rate
 > using checkpoint value 1e-05 for minimum learning rate
 > using checkpoint value 183105 for warmup iterations
 > using checkpoint value 73242187 for total number of iterations
 > using checkpoint value cosine for decay style
successfully loaded 4 ZeRO state_dicts for rank 18
successfully loaded 4 ZeRO state_dicts for rank 30
successfully loaded 4 ZeRO state_dicts for rank 34
successfully loaded 4 ZeRO state_dicts for rank 22
successfully loaded 4 ZeRO state_dicts for rank 41
successfully loaded 4 ZeRO state_dicts for rank 45
successfully loaded 4 ZeRO state_dicts for rank 32
successfully loaded 4 ZeRO state_dicts for rank 44
successfully loaded 4 ZeRO state_dicts for rank 37
successfully loaded 4 ZeRO state_dicts for rank 46
successfully loaded 4 ZeRO state_dicts for rank 33
successfully loaded 4 ZeRO state_dicts for rank 38
successfully loaded 4 ZeRO state_dicts for rank 26
successfully loaded 4 ZeRO state_dicts for rank 27
successfully loaded 4 ZeRO state_dicts for rank 19
successfully loaded 4 ZeRO state_dicts for rank 40
successfully loaded 4 ZeRO state_dicts for rank 20
successfully loaded 4 ZeRO state_dicts for rank 16
successfully loaded 4 ZeRO state_dicts for rank 23
successfully loaded 4 ZeRO state_dicts for rank 36
successfully loaded 4 ZeRO state_dicts for rank 24
successfully loaded 4 ZeRO state_dicts for rank 29
successfully loaded 4 ZeRO state_dicts for rank 21
successfully loaded 4 ZeRO state_dicts for rank 17
successfully loaded 4 ZeRO state_dicts for rank 25
successfully loaded 4 ZeRO state_dicts for rank 28
successfully loaded 4 ZeRO state_dicts for rank 31
successfully loaded 4 ZeRO state_dicts for rank 42
loading 4 zero partition checkpoints for rank 18
loading 4 zero partition checkpoints for rank 30
loading 4 zero partition checkpoints for rank 34
successfully loaded 4 ZeRO state_dicts for rank 47
successfully loaded 4 ZeRO state_dicts for rank 43
successfully loaded 4 ZeRO state_dicts for rank 39
successfully loaded 4 ZeRO state_dicts for rank 12
successfully loaded 4 ZeRO state_dicts for rank 60
loading 4 zero partition checkpoints for rank 22
loading 4 zero partition checkpoints for rank 45
successfully loaded 4 ZeRO state_dicts for rank 53
successfully loaded 4 ZeRO state_dicts for rank 49
loading 4 zero partition checkpoints for rank 32
successfully loaded 4 ZeRO state_dicts for rank 0
loading 4 zero partition checkpoints for rank 41
loading 4 zero partition checkpoints for rank 44
successfully loaded 4 ZeRO state_dicts for rank 35
loading 4 zero partition checkpoints for rank 33
loading 4 zero partition checkpoints for rank 46
successfully loaded 4 ZeRO state_dicts for rank 48
successfully loaded 4 ZeRO state_dicts for rank 56
successfully loaded 4 ZeRO state_dicts for rank 52
loading 4 zero partition checkpoints for rank 37
loading 4 zero partition checkpoints for rank 38
loading 4 zero partition checkpoints for rank 20
loading 4 zero partition checkpoints for rank 16
successfully loaded 4 ZeRO state_dicts for rank 5
successfully loaded 4 ZeRO state_dicts for rank 1
loading 4 zero partition checkpoints for rank 26
loading 4 zero partition checkpoints for rank 36
loading 4 zero partition checkpoints for rank 19
loading 4 zero partition checkpoints for rank 27
loading 4 zero partition checkpoints for rank 23
loading 4 zero partition checkpoints for rank 40
successfully loaded 4 ZeRO state_dicts for rank 57
loading 4 zero partition checkpoints for rank 29
loading 4 zero partition checkpoints for rank 21
loading 4 zero partition checkpoints for rank 17
loading 4 zero partition checkpoints for rank 24
loading 4 zero partition checkpoints for rank 25
successfully loaded 4 ZeRO state_dicts for rank 2
loading 4 zero partition checkpoints for rank 28
successfully loaded 4 ZeRO state_dicts for rank 6
loading 4 zero partition checkpoints for rank 31
loading 4 zero partition checkpoints for rank 42
successfully loaded 4 ZeRO state_dicts for rank 4
successfully loaded 4 ZeRO state_dicts for rank 8
successfully loaded 4 ZeRO state_dicts for rank 61
successfully loaded 4 ZeRO state_dicts for rank 51
loading 4 zero partition checkpoints for rank 47
successfully loaded 4 ZeRO state_dicts for rank 14
successfully loaded 4 ZeRO state_dicts for rank 55
successfully loaded 4 ZeRO state_dicts for rank 13
loading 4 zero partition checkpoints for rank 39
loading 4 zero partition checkpoints for rank 43
loading 4 zero partition checkpoints for rank 35
successfully loaded 4 ZeRO state_dicts for rank 58
successfully loaded 4 ZeRO state_dicts for rank 50
successfully loaded 4 ZeRO state_dicts for rank 15
successfully loaded 4 ZeRO state_dicts for rank 10
successfully loaded 4 ZeRO state_dicts for rank 7
successfully loaded 4 ZeRO state_dicts for rank 9
successfully loaded 4 ZeRO state_dicts for rank 3
successfully loaded 4 ZeRO state_dicts for rank 54
loading 4 zero partition checkpoints for rank 12
successfully loaded 4 ZeRO state_dicts for rank 62
loading 4 zero partition checkpoints for rank 60
successfully loaded 4 ZeRO state_dicts for rank 11
loading 4 zero partition checkpoints for rank 53
successfully loaded 4 ZeRO state_dicts for rank 63
successfully loaded 4 ZeRO state_dicts for rank 59
loading 4 zero partition checkpoints for rank 49
loading 4 zero partition checkpoints for rank 0
 checkpoint version 3.0
loading 4 zero partition checkpoints for rank 56
loading 4 zero partition checkpoints for rank 48
loading 4 zero partition checkpoints for rank 52
loading 4 zero partition checkpoints for rank 5
loading 4 zero partition checkpoints for rank 1
loading 4 zero partition checkpoints for rank 57
loading 4 zero partition checkpoints for rank 6
loading 4 zero partition checkpoints for rank 2
loading 4 zero partition checkpoints for rank 4
loading 4 zero partition checkpoints for rank 61
loading 4 zero partition checkpoints for rank 8
loading 4 zero partition checkpoints for rank 51
loading 4 zero partition checkpoints for rank 14
loading 4 zero partition checkpoints for rank 55
loading 4 zero partition checkpoints for rank 13
loading 4 zero partition checkpoints for rank 10
loading 4 zero partition checkpoints for rank 58
loading 4 zero partition checkpoints for rank 15
loading 4 zero partition checkpoints for rank 50
loading 4 zero partition checkpoints for rank 7
loading 4 zero partition checkpoints for rank 3
loading 4 zero partition checkpoints for rank 9
loading 4 zero partition checkpoints for rank 62
loading 4 zero partition checkpoints for rank 54
loading 4 zero partition checkpoints for rank 59
loading 4 zero partition checkpoints for rank 11
loading 4 zero partition checkpoints for rank 63
  successfully loaded checkpoint from /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints at iteration 32268
time (ms) | load-checkpoint: 2666.21
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.62471936
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264estimated model parameters: 1.209483264estimated model parameters: 1.209483264estimated model parameters: 1.209483264

/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264estimated model parameters: 1.209483264

estimated model parameters: 1.209483264


estimated model parameters: 1.62471936
estimated model parameters: 1.62471936
estimated model parameters: 1.62471936
estimated model parameters: 1.209483264
estimated model parameters: 1.62471936
estimated model parameters: 1.62471936
estimated model parameters: 1.62471936
estimated model parameters: 1.62471936
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.62471936estimated model parameters: 1.62471936

estimated model parameters: 1.624784896
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264estimated model parameters: 1.209483264

estimated model parameters: 1.209483264estimated model parameters: 1.209483264
estimated model parameters: 1.209483264

estimated model parameters: 1.624784896
estimated model parameters: 1.209483264
estimated model parameters: 1.62471936estimated model parameters: 1.62471936

/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264


/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488

estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.2095488
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.624784896
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.62471936
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.62471936
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.624784896estimated model parameters: 1.624784896

estimated model parameters: 1.209483264
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.62471936
estimated model parameters: 1.209483264
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.209483264
estimated model parameters: 1.62471936
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.624784896estimated model parameters: 1.624784896

estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488

estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
[after model, optimizer, and learning rate scheduler are built] datetime: 2021-09-29 10:39:05 
> building train, validation, and test datasets ...
 > datasets target sizes (minimum size):
    train:      73242187
    validation: 7833600
    test:       51200
> building train, validation, and test datasets for GPT ...
 > building dataset index ...
    reading sizes...
    reading pointers...
    reading document index...
    creating numpy buffer of mmap...
    creating memory view of numpy buffer...
 > finished creating indexed dataset in 0.116380 seconds
    number of documents: 304230423
 > dataset split:
    train:
     document indices in [0, 288714672) total of 288714672 documents
    validation:
     document indices in [288714672, 303926193) total of 15211521 documents
    test:
     document indices in [303926193, 304230423) total of 304230 documents
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.181 seconds
    total number of samples: 131537224
    total number of epochs: 1
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.286 seconds
    total number of samples: 13854322
    total number of epochs: 2
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.073 seconds
    total number of samples: 137384
    total number of epochs: 1
> finished creating GPT datasets ...
[after dataloaders are built] datetime: 2021-09-29 10:39:11 
done with setup ...
training ...
time (ms) | model-and-optimizer-setup: 4382.72 | train/valid/test-data-iterators-setup: 5591.94
Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion

Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion

Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion

Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion

Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion

Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion

Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billion


Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billion

Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion

Number of parameters without embeddings: 1.2095488 billionNumber of parameters: 1.624784896 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion

Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion

Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion

Number of parameters: 1.624784896 billion
Number of parameters without embeddings: 1.2095488 billionNumber of parameters without embeddings: 1.2095488 billion

Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion

Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billionNumber of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion

Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters: 1.62471936 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
[before the start of training step] datetime: 2021-09-29 10:39:12 
[2021-09-29 10:39:12,238] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information
[2021-09-29 10:39:12,238] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False
[2021-09-29 10:39:12,238] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 24 total layers
[2021-09-29 10:39:12,238] [INFO] [checkpointing.py:415:forward] ----Synchronization False
[2021-09-29 10:39:12,238] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False
[Rank 49] (after 32400 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 6758.0 | max reserved: 6758.0
[Rank 48] (after 32400 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 6758.0 | max reserved: 6758.0
[Rank 32] (after 32400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4380.0 | max reserved: 4380.0
[Rank 16] (after 32400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4652.0 | max reserved: 4652.0
[Rank 0] (after 32400 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.95751953125 | reserved: 5382.0 | max reserved: 5382.0
[Rank 18] (after 32400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4764.0 | max reserved: 4764.0
[Rank 2] (after 32400 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.45751953125 | reserved: 5510.0 | max reserved: 5510.0
[Rank 34] (after 32400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4380.0 | max reserved: 4380.0
[Rank 50] (after 32400 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 6694.0 | max reserved: 6694.0
[Rank 35] (after 32400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4348.0 | max reserved: 4348.0
[Rank 51] (after 32400 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 6742.0 | max reserved: 6742.0
[Rank 19] (after 32400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4764.0 | max reserved: 4764.0
[Rank 17] (after 32400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4748.0 | max reserved: 4748.0
[Rank 33] (after 32400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4348.0 | max reserved: 4348.0
[Rank 3] (after 32400 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.45751953125 | reserved: 5318.0 | max reserved: 5318.0
[Rank 1] (after 32400 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.95751953125 | reserved: 5510.0 | max reserved: 5510.0
 iteration    32400/  152972 | consumed samples:     11509184 | elapsed time per iteration (ms): 6174.1 | learning rate: 1.890E-04 | global batch size:   512 | lm loss: 2.973449E+00 | loss scale: 131072.0 | grad norm: 9629.643 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    32600/  152972 | consumed samples:     11611584 | elapsed time per iteration (ms): 6118.3 | learning rate: 1.888E-04 | global batch size:   512 | lm loss: 2.966388E+00 | loss scale: 131072.0 | grad norm: 11739.827 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    32800/  152972 | consumed samples:     11713984 | elapsed time per iteration (ms): 6140.5 | learning rate: 1.886E-04 | global batch size:   512 | lm loss: 2.966432E+00 | loss scale: 262144.0 | grad norm: 23456.021 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    33000/  152972 | consumed samples:     11816384 | elapsed time per iteration (ms): 6143.7 | learning rate: 1.884E-04 | global batch size:   512 | lm loss: 2.969148E+00 | loss scale: 262144.0 | grad norm: 25330.492 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 33000 | lm loss value: 2.924494E+00 | lm loss PPL: 1.862480E+01 | 
-------------------------------------------------------------------------------------------------
saving checkpoint at iteration   33000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-09-29 11:57:08,861] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step33000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   33000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1705.73
 iteration    33200/  152972 | consumed samples:     11918784 | elapsed time per iteration (ms): 7070.3 | learning rate: 1.882E-04 | global batch size:   512 | lm loss: 2.971375E+00 | loss scale: 524288.0 | grad norm: 52956.942 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    33400/  152972 | consumed samples:     12021184 | elapsed time per iteration (ms): 6158.1 | learning rate: 1.880E-04 | global batch size:   512 | lm loss: 2.975687E+00 | loss scale: 524288.0 | grad norm: 87945.062 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    33600/  152972 | consumed samples:     12123584 | elapsed time per iteration (ms): 6176.9 | learning rate: 1.878E-04 | global batch size:   512 | lm loss: 2.977509E+00 | loss scale: 524288.0 | grad norm: 49030.599 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    33800/  152972 | consumed samples:     12225984 | elapsed time per iteration (ms): 6166.2 | learning rate: 1.876E-04 | global batch size:   512 | lm loss: 2.973132E+00 | loss scale: 1048576.0 | grad norm: 99941.123 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-29 13:39:56,240] [INFO] [logging.py:68:log_dist] [Rank 0] step=34000, skipped=71, lr=[0.00018739170352292736, 0.00018739170352292736], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 34000 loss: 2.9726 iter time (s): 0.003 samples/sec: 166224.229
 iteration    34000/  152972 | consumed samples:     12328384 | elapsed time per iteration (ms): 6172.6 | learning rate: 1.874E-04 | global batch size:   512 | lm loss: 2.976802E+00 | loss scale: 1048576.0 | grad norm: 106174.188 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 34000 | lm loss value: 2.924554E+00 | lm loss PPL: 1.862591E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    34200/  152972 | consumed samples:     12430784 | elapsed time per iteration (ms): 7086.6 | learning rate: 1.872E-04 | global batch size:   512 | lm loss: 2.975370E+00 | loss scale: 524288.0 | grad norm: 49506.748 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    34400/  152972 | consumed samples:     12533184 | elapsed time per iteration (ms): 6167.9 | learning rate: 1.870E-04 | global batch size:   512 | lm loss: 2.973793E+00 | loss scale: 524288.0 | grad norm: 52891.942 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration   34500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-09-29 14:34:23,293] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step34500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   34500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1505.70
 iteration    34600/  152972 | consumed samples:     12635584 | elapsed time per iteration (ms): 6172.3 | learning rate: 1.868E-04 | global batch size:   512 | lm loss: 2.975412E+00 | loss scale: 524288.0 | grad norm: 49008.294 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    34800/  152972 | consumed samples:     12737984 | elapsed time per iteration (ms): 6175.0 | learning rate: 1.865E-04 | global batch size:   512 | lm loss: 2.974226E+00 | loss scale: 1048576.0 | grad norm: 91979.250 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    35000/  152972 | consumed samples:     12840384 | elapsed time per iteration (ms): 6168.0 | learning rate: 1.863E-04 | global batch size:   512 | lm loss: 2.972278E+00 | loss scale: 1048576.0 | grad norm: 113143.990 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 35000 | lm loss value: 2.925368E+00 | lm loss PPL: 1.864109E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    35200/  152972 | consumed samples:     12942784 | elapsed time per iteration (ms): 7119.1 | learning rate: 1.861E-04 | global batch size:   512 | lm loss: 2.974394E+00 | loss scale: 1048576.0 | grad norm: 103442.523 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    35400/  152972 | consumed samples:     13045184 | elapsed time per iteration (ms): 6154.3 | learning rate: 1.859E-04 | global batch size:   512 | lm loss: 2.971284E+00 | loss scale: 1048576.0 | grad norm: 110331.005 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    35600/  152972 | consumed samples:     13147584 | elapsed time per iteration (ms): 6159.1 | learning rate: 1.857E-04 | global batch size:   512 | lm loss: 2.965182E+00 | loss scale: 1048576.0 | grad norm: 110840.114 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    35800/  152972 | consumed samples:     13249984 | elapsed time per iteration (ms): 6181.3 | learning rate: 1.855E-04 | global batch size:   512 | lm loss: 2.970983E+00 | loss scale: 1048576.0 | grad norm: 94889.206 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-29 17:11:48,860] [INFO] [logging.py:68:log_dist] [Rank 0] step=36000, skipped=76, lr=[0.00018523568489549322, 0.00018523568489549322], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 36000 loss: 2.9970 iter time (s): 0.003 samples/sec: 168003.611
 iteration    36000/  152972 | consumed samples:     13352384 | elapsed time per iteration (ms): 6179.5 | learning rate: 1.852E-04 | global batch size:   512 | lm loss: 2.971390E+00 | loss scale: 1048576.0 | grad norm: 101616.094 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 36000 | lm loss value: 2.919620E+00 | lm loss PPL: 1.853424E+01 | 
-------------------------------------------------------------------------------------------------
saving checkpoint at iteration   36000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-09-29 17:14:43,538] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step36000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   36000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1636.07
 iteration    36200/  152972 | consumed samples:     13454784 | elapsed time per iteration (ms): 7052.5 | learning rate: 1.850E-04 | global batch size:   512 | lm loss: 2.968068E+00 | loss scale: 2097152.0 | grad norm: 202902.762 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    36400/  152972 | consumed samples:     13557184 | elapsed time per iteration (ms): 6180.9 | learning rate: 1.848E-04 | global batch size:   512 | lm loss: 2.967202E+00 | loss scale: 1048576.0 | grad norm: 103053.796 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    36600/  152972 | consumed samples:     13659584 | elapsed time per iteration (ms): 6183.0 | learning rate: 1.846E-04 | global batch size:   512 | lm loss: 2.966201E+00 | loss scale: 1048576.0 | grad norm: 197510.099 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    36800/  152972 | consumed samples:     13761984 | elapsed time per iteration (ms): 6169.7 | learning rate: 1.843E-04 | global batch size:   512 | lm loss: 2.965993E+00 | loss scale: 2097152.0 | grad norm: 203305.498 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    37000/  152972 | consumed samples:     13864384 | elapsed time per iteration (ms): 6174.4 | learning rate: 1.841E-04 | global batch size:   512 | lm loss: 2.966695E+00 | loss scale: 2097152.0 | grad norm: 217254.054 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 37000 | lm loss value: 2.913020E+00 | lm loss PPL: 1.841231E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    37200/  152972 | consumed samples:     13966784 | elapsed time per iteration (ms): 7032.6 | learning rate: 1.839E-04 | global batch size:   512 | lm loss: 3.130006E+00 | loss scale: 65536.0 | grad norm: 7709.095 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    37400/  152972 | consumed samples:     14069184 | elapsed time per iteration (ms): 6146.5 | learning rate: 1.836E-04 | global batch size:   512 | lm loss: 2.987290E+00 | loss scale: 65536.0 | grad norm: 6446.025 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration   37500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-09-29 19:51:53,237] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step37500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   37500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1566.03
 iteration    37600/  152972 | consumed samples:     14171584 | elapsed time per iteration (ms): 6167.1 | learning rate: 1.834E-04 | global batch size:   512 | lm loss: 2.970526E+00 | loss scale: 65536.0 | grad norm: 6792.106 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    37800/  152972 | consumed samples:     14273984 | elapsed time per iteration (ms): 6172.5 | learning rate: 1.832E-04 | global batch size:   512 | lm loss: 2.965726E+00 | loss scale: 131072.0 | grad norm: 12976.988 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-29 20:43:19,984] [INFO] [logging.py:68:log_dist] [Rank 0] step=38000, skipped=84, lr=[0.00018292848940383894, 0.00018292848940383894], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 38000 loss: 2.9558 iter time (s): 0.003 samples/sec: 168470.490
 iteration    38000/  152972 | consumed samples:     14376384 | elapsed time per iteration (ms): 6176.6 | learning rate: 1.829E-04 | global batch size:   512 | lm loss: 2.964112E+00 | loss scale: 131072.0 | grad norm: 12877.667 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 38000 | lm loss value: 2.913800E+00 | lm loss PPL: 1.842670E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    38200/  152972 | consumed samples:     14478784 | elapsed time per iteration (ms): 7087.6 | learning rate: 1.827E-04 | global batch size:   512 | lm loss: 2.960802E+00 | loss scale: 262144.0 | grad norm: 26585.218 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    38400/  152972 | consumed samples:     14581184 | elapsed time per iteration (ms): 6173.9 | learning rate: 1.824E-04 | global batch size:   512 | lm loss: 2.955464E+00 | loss scale: 262144.0 | grad norm: 23892.765 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    38600/  152972 | consumed samples:     14683584 | elapsed time per iteration (ms): 6163.8 | learning rate: 1.822E-04 | global batch size:   512 | lm loss: 2.960490E+00 | loss scale: 262144.0 | grad norm: 24490.883 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    38800/  152972 | consumed samples:     14785984 | elapsed time per iteration (ms): 6180.3 | learning rate: 1.820E-04 | global batch size:   512 | lm loss: 2.954077E+00 | loss scale: 524288.0 | grad norm: 50095.424 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    39000/  152972 | consumed samples:     14888384 | elapsed time per iteration (ms): 6174.1 | learning rate: 1.817E-04 | global batch size:   512 | lm loss: 2.953341E+00 | loss scale: 524288.0 | grad norm: 64409.875 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 39000 | lm loss value: 2.906075E+00 | lm loss PPL: 1.828489E+01 | 
-------------------------------------------------------------------------------------------------
saving checkpoint at iteration   39000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-09-29 22:32:16,005] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step39000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   39000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1569.33
 iteration    39200/  152972 | consumed samples:     14990784 | elapsed time per iteration (ms): 7094.6 | learning rate: 1.815E-04 | global batch size:   512 | lm loss: 2.957802E+00 | loss scale: 1048576.0 | grad norm: 98465.197 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    39400/  152972 | consumed samples:     15093184 | elapsed time per iteration (ms): 6183.9 | learning rate: 1.812E-04 | global batch size:   512 | lm loss: 2.951240E+00 | loss scale: 1048576.0 | grad norm: 98828.632 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    39600/  152972 | consumed samples:     15195584 | elapsed time per iteration (ms): 6190.8 | learning rate: 1.810E-04 | global batch size:   512 | lm loss: 2.954536E+00 | loss scale: 1048576.0 | grad norm: 102900.885 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    39800/  152972 | consumed samples:     15297984 | elapsed time per iteration (ms): 6195.5 | learning rate: 1.807E-04 | global batch size:   512 | lm loss: 2.950327E+00 | loss scale: 1048576.0 | grad norm: 99370.965 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-30 00:15:24,572] [INFO] [logging.py:68:log_dist] [Rank 0] step=40000, skipped=90, lr=[0.00018046888949924708, 0.00018046888949924708], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 40000 loss: 2.9342 iter time (s): 0.003 samples/sec: 169236.805
 iteration    40000/  152972 | consumed samples:     15400384 | elapsed time per iteration (ms): 6178.3 | learning rate: 1.805E-04 | global batch size:   512 | lm loss: 2.963440E+00 | loss scale: 65536.0 | grad norm: 6475.755 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 40000 | lm loss value: 2.906275E+00 | lm loss PPL: 1.828854E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    40200/  152972 | consumed samples:     15502784 | elapsed time per iteration (ms): 7064.6 | learning rate: 1.802E-04 | global batch size:   512 | lm loss: 2.959289E+00 | loss scale: 65536.0 | grad norm: 6584.743 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    40400/  152972 | consumed samples:     15605184 | elapsed time per iteration (ms): 6162.9 | learning rate: 1.800E-04 | global batch size:   512 | lm loss: 2.953585E+00 | loss scale: 131072.0 | grad norm: 13519.633 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration   40500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-09-30 01:09:45,403] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step40500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   40500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1667.24
 iteration    40600/  152972 | consumed samples:     15707584 | elapsed time per iteration (ms): 6148.6 | learning rate: 1.797E-04 | global batch size:   512 | lm loss: 2.950395E+00 | loss scale: 131072.0 | grad norm: 12445.747 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    40800/  152972 | consumed samples:     15809984 | elapsed time per iteration (ms): 6136.1 | learning rate: 1.794E-04 | global batch size:   512 | lm loss: 2.950941E+00 | loss scale: 131072.0 | grad norm: 13683.100 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    41000/  152972 | consumed samples:     15912384 | elapsed time per iteration (ms): 6163.9 | learning rate: 1.792E-04 | global batch size:   512 | lm loss: 2.943672E+00 | loss scale: 262144.0 | grad norm: 26293.161 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 41000 | lm loss value: 2.898256E+00 | lm loss PPL: 1.814247E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    41200/  152972 | consumed samples:     16014784 | elapsed time per iteration (ms): 7065.0 | learning rate: 1.789E-04 | global batch size:   512 | lm loss: 2.951874E+00 | loss scale: 65536.0 | grad norm: 6057.021 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    41400/  152972 | consumed samples:     16117184 | elapsed time per iteration (ms): 6176.4 | learning rate: 1.787E-04 | global batch size:   512 | lm loss: 2.950067E+00 | loss scale: 65536.0 | grad norm: 6836.837 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    41600/  152972 | consumed samples:     16219584 | elapsed time per iteration (ms): 6167.0 | learning rate: 1.784E-04 | global batch size:   512 | lm loss: 2.961946E+00 | loss scale: 131072.0 | grad norm: 13430.697 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    41800/  152972 | consumed samples:     16321984 | elapsed time per iteration (ms): 6142.8 | learning rate: 1.781E-04 | global batch size:   512 | lm loss: 2.945664E+00 | loss scale: 131072.0 | grad norm: 14303.788 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-30 03:46:38,753] [INFO] [logging.py:68:log_dist] [Rank 0] step=42000, skipped=93, lr=[0.00017785983799521653, 0.00017785983799521653], mom=[(0.9, 0.999), (0.9, 0.999)]
 iteration    42000/  152972 | consumed samples:     16424384 | elapsed time per iteration (ms): 6143.5 | learning rate: 1.779E-04 | global batch size:   512 | lm loss: 2.945719E+00 | loss scale: 131072.0 | grad norm: 13233.798 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
steps: 42000 loss: 2.9588 iter time (s): 0.003 samples/sec: 168378.949
-------------------------------------------------------------------------------------------------
 validation loss at iteration 42000 | lm loss value: 2.893356E+00 | lm loss PPL: 1.805379E+01 | 
-------------------------------------------------------------------------------------------------
saving checkpoint at iteration   42000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-09-30 03:49:40,202] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step42000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   42000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1490.10
 iteration    42200/  152972 | consumed samples:     16526784 | elapsed time per iteration (ms): 7082.1 | learning rate: 1.776E-04 | global batch size:   512 | lm loss: 2.941561E+00 | loss scale: 262144.0 | grad norm: 24312.559 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    42400/  152972 | consumed samples:     16629184 | elapsed time per iteration (ms): 6179.2 | learning rate: 1.773E-04 | global batch size:   512 | lm loss: 2.945879E+00 | loss scale: 262144.0 | grad norm: 27153.375 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    42600/  152972 | consumed samples:     16731584 | elapsed time per iteration (ms): 6185.8 | learning rate: 1.770E-04 | global batch size:   512 | lm loss: 2.938939E+00 | loss scale: 262144.0 | grad norm: 25700.885 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    42800/  152972 | consumed samples:     16833984 | elapsed time per iteration (ms): 6163.8 | learning rate: 1.768E-04 | global batch size:   512 | lm loss: 2.940046E+00 | loss scale: 524288.0 | grad norm: 49709.805 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    43000/  152972 | consumed samples:     16936384 | elapsed time per iteration (ms): 6177.1 | learning rate: 1.765E-04 | global batch size:   512 | lm loss: 2.939341E+00 | loss scale: 524288.0 | grad norm: 47217.024 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 43000 | lm loss value: 2.885082E+00 | lm loss PPL: 1.790504E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    43200/  152972 | consumed samples:     17038784 | elapsed time per iteration (ms): 7102.1 | learning rate: 1.762E-04 | global batch size:   512 | lm loss: 2.938433E+00 | loss scale: 524288.0 | grad norm: 50119.900 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    43400/  152972 | consumed samples:     17141184 | elapsed time per iteration (ms): 6203.0 | learning rate: 1.759E-04 | global batch size:   512 | lm loss: 2.934588E+00 | loss scale: 1048576.0 | grad norm: 106032.795 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration   43500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-09-30 06:27:17,450] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step43500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   43500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1662.47
saving checkpoint at iteration   43511 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-09-30 06:28:27,053] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step43511/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   43511 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1583.64
[exiting program after 1190.0435673157374 minutes] datetime: 2021-09-30 06:28:28 
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
2021-09-30 06:29:12.835108: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:12.835175: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:12.835194: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:12.837335: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:12.837480: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:12.838198: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:12.838197: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:12.838213: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:12.840710: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:12.841444: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:12.841458: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:12.841456: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:12.841706: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:12.841849: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:12.841852: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:12.842145: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:12.843767: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:12.844246: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:12.844368: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:12.844530: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:12.849298: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:12.849339: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:12.849335: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:12.849430: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:12.853680: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:12.853682: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:12.853691: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:12.853691: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:12.868288: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:12.868460: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:12.868479: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:12.868479: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:12.874108: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:12.874144: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:12.874192: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:12.874195: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:13.091730: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:13.091732: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:13.091731: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:13.091737: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:13.259005: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:13.259002: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:13.259004: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:13.259017: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:13.350872: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:13.350881: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:13.350876: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:13.350877: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:13.377401: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:13.377410: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:13.377408: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:13.377410: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:13.438944: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:13.438952: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:13.438944: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:13.438955: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:13.881591: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:13.881604: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:13.881596: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:13.881612: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:14.241786: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:14.241794: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:14.241797: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-30 06:29:14.241795: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------DeepSpeed C++/CUDA extension op report

JIT compiled ops requires ninja--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

JIT compiled ops requires ninja--------------------------------------------------
DeepSpeed C++/CUDA extension op report

--------------------------------------------------DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------

JIT compiled ops requires ninja
DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report

--------------------------------------------------
----------------------------------------------------------------------------------------------------
JIT compiled ops requires ninja
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------DeepSpeed C++/CUDA extension op report

JIT compiled ops requires ninja--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja   .................. ......................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m

[92m[OKAY][0m
----------------------------------------------------------------------------------------------------
--------------------------------------------------

--------------------------------------------------
op name
op name  op nameop name................................    ................installedinstalled................  ..   installed..compatibleinstalled 
 ..-------------------------------------------------- compatible 

compatible..
 compatible--------------------------------------------------

----------------------------------------------------------------------------------------------------
cpu_adam
 ............... [92m[YES][0mcpu_adam  ......cpu_adamcpu_adam...............   [92m[OKAY][0m............... [92m[YES][0m...............  ......
  [92m[YES][0m[92m[OKAY][0m[92m[YES][0m  
............  [92m[OKAY][0m[92m[OKAY][0m
fused_adam 
............. fused_adam[93m[NO][0m  ....................  [92m[OKAY][0mfused_adam[93m[NO][0m
 fused_adam ............. .......fused_lamb.............   ............. [93m[NO][0m [92m[OKAY][0m [93m[NO][0m[93m[NO][0m
  .............. fused_lamb ....... .............[92m[OKAY][0m[92m[OKAY][0m 

 [92m[OKAY][0m[93m[NO][0m
 fused_lamb....... fused_lamb ............. [92m[OKAY][0m .............
[93m[NO][0m sparse_attn [93m[NO][0m ....... ............ ....... [92m[OKAY][0m [93m[NO][0m
sparse_attn [92m[OKAY][0m .......
 ............[92m[OKAY][0m 
[93m[NO][0m transformer.......sparse_attn   ........................[92m[OKAY][0m  sparse_attn[93m[NO][0m[93m[NO][0m
   ..........................transformer   [92m[OKAY][0m [93m[NO][0m
............[92m[OKAY][0m  
.......[93m[NO][0mstochastic_transformer transformer [92m[OKAY][0m.......  
 .............[92m[OKAY][0m  transformer[93m[NO][0m
 [93m[NO][0m ...................  .......stochastic_transformer  [92m[OKAY][0m 
[92m[OKAY][0m[93m[NO][0m.
  .......[93m[NO][0m  [92m[OKAY][0m.......stochastic_transformer
  [92m[OKAY][0m. 
stochastic_transformer[93m[NO][0m  ........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
ninjaninjaninjaninja    ...................................................... ..................  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

--------------------------------------------------
--------------------------------------------------
op nameop name
  op name................op name ................   ................installed................installed  installed  .... installed  ..compatible compatible 

..compatible-------------------------------------------------- --------------------------------------------------

compatible
--------------------------------------------------

--------------------------------------------------
cpu_adam cpu_adam...............cpu_adam cpu_adam[92m[YES][0m    ...................................................    [92m[OKAY][0m
[92m[YES][0m[92m[YES][0m[92m[YES][0m   ..................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0mfused_adam
 

............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adamfused_adam fused_adamfused_lamb ............. .............  ............. .............[93m[NO][0m  [93m[NO][0m [93m[NO][0m....... [93m[NO][0m .......   [92m[OKAY][0m.......[92m[OKAY][0m.......
 
 [92m[OKAY][0m[92m[OKAY][0m

fused_lambfused_lamb fused_lamb ............. ............. ............. [93m[NO][0m [93m[NO][0m [93m[NO][0m ....... .......sparse_attn.......   [92m[OKAY][0m............ 
 [92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m

 ....... [92m[OKAY][0m
transformer ............ sparse_attn[93m[NO][0m sparse_attn sparse_attn...................    ............[92m[OKAY][0m............[93m[NO][0m
   [93m[NO][0m[93m[NO][0m.......  stochastic_transformer .............. [92m[OKAY][0m  .
 [92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m
 
.......transformertransformer  transformer[92m[OKAY][0m  ........................ 
............[93m[NO][0m   [93m[NO][0m.......[93m[NO][0m   .......[92m[OKAY][0m.......
  [92m[OKAY][0m[92m[OKAY][0m

stochastic_transformer . [93m[NO][0mstochastic_transformerstochastic_transformer   .........   [92m[OKAY][0m[93m[NO][0m
[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............ [92m[OKAY][0m 
[92m[OKAY][0m
quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io ...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inference .. transformer_inference[93m[NO][0m  .........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
utils .................. utils[92m[YES][0m  ........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 
.................... 1.8.1
torch version ....................torch cuda version  1.8.1...............
 torch cuda version11.1 
...............nvcc version  11.1.....................
 nvcc version11.2 
.....................deepspeed install path  11.2...........
 deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
...........deepspeed info  ...................['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed info
 deepspeed wheel compiled w....................  ......0.4.2+72ce55a, 72ce55a, big-science 
torch 1.8, cuda 11.1deepspeed wheel compiled w.
 ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed info deepspeed info...................  ...................0.4.2+72ce55a, 72ce55a, big-science 
0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch install path  ...................................  1.8.1
torch cuda version ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']11.1

nvcc version torch version.....................  ....................11.2 
1.8.1deepspeed install path
 ...........torch cuda version  ...............['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
11.1
deepspeed infonvcc version  ........................................  0.4.2+72ce55a, 72ce55a, big-science11.2

deepspeed wheel compiled w.deepspeed install path  .................  torch 1.8, cuda 11.1
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report


DeepSpeed C++/CUDA extension op report--------------------------------------------------
DeepSpeed C++/CUDA extension op report

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report
--------------------------------------------------


JIT compiled ops requires ninja----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja   .................. ......................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
op name
 
op name................op name   op name................................installed    installed..................installed    ....compatible installed 
compatible compatible
--------------------------------------------------..
-------------------------------------------------- 

compatible--------------------------------------------------

--------------------------------------------------
cpu_adam cpu_adam............... cpu_adam...............  cpu_adam [92m[YES][0m............... [92m[YES][0m   .....................[92m[YES][0m......  [92m[YES][0m  [92m[OKAY][0m ......[92m[OKAY][0m 
......
[92m[OKAY][0m 
[92m[OKAY][0m
fused_adam fused_adam.............  fused_adam.............[93m[NO][0mfused_adam  [93m[NO][0m   ........................................    [93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m
 
 .............. fused_lamb  [92m[OKAY][0mfused_lamb[92m[OKAY][0m.............
  
[93m[NO][0m.............  fused_lamb....... [93m[NO][0mfused_lamb ............. [92m[OKAY][0m  
....................[93m[NO][0m   [92m[OKAY][0m.......[93m[NO][0m
  .......[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attn ............ [93m[NO][0m .......sparse_attn  [92m[OKAY][0m............
 [93m[NO][0msparse_attnsparse_attn   ...............................transformer   [93m[NO][0m ............[92m[OKAY][0m[93m[NO][0m
   .......[93m[NO][0m.......transformer    [92m[OKAY][0m.......[92m[OKAY][0m............
 
 [92m[OKAY][0m[93m[NO][0m
transformer transformer .......stochastic_transformer............   [92m[OKAY][0m ............[93m[NO][0m.
   .......[93m[NO][0m[93m[NO][0m stochastic_transformer  [92m[OKAY][0m....... .......
 . [92m[OKAY][0m 
[93m[NO][0m[92m[OKAY][0mstochastic_transformer 
 ........  [92m[OKAY][0mstochastic_transformer[93m[NO][0m
  ........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0masync_io  ......................  [93m[NO][0m[93m[NO][0m
 ....... [93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report--------------------------------------------------


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninja----------------------------------------------------------------------------------------------------


JIT compiled ops requires ninja
JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch install pathtorch version  ...................................  1.8.1
torch cuda version ............... 11.1
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']nvcc version
 ..................... 11.2torch version
 deepspeed install path....................  ...........1.8.1 
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']torch cuda version
 ...............deepspeed info  11.1...................
 nvcc version0.4.2+72ce55a, 72ce55a, big-science 
.....................deepspeed wheel compiled w.  11.2......
 deepspeed install pathtorch 1.8, cuda 11.1 
........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
> setting tensorboard ...
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
DeepSpeed general environment info:deepspeed wheel compiled w. ...... 
torch 1.8, cuda 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninjaninjaninjaninja  ....................................    [92m[OKAY][0m..................[92m[OKAY][0m..................

  [92m[OKAY][0m----------------------------------------------------------------------------------------------------[92m[OKAY][0m


op name--------------------------------------------------op name--------------------------------------------------  

................................op nameop name   installed installed ................................  ..  installed..installedcompatible   
....compatible-------------------------------------------------- 
 
compatible--------------------------------------------------compatible


--------------------------------------------------
--------------------------------------------------
cpu_adam ............... cpu_adam[92m[YES][0m  cpu_adam.....................   cpu_adam[92m[YES][0m[92m[OKAY][0m...............  
...... ............... [92m[YES][0m[92m[OKAY][0m  
......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0mfused_adam
 .............fused_adam [93m[NO][0m  ....................  [93m[NO][0mfused_adam[92m[OKAY][0m fused_adam
.......   .............fused_lamb.............[92m[OKAY][0m   
.............[93m[NO][0m[93m[NO][0m   [93m[NO][0mfused_lamb..............    .......[92m[OKAY][0m............. [92m[OKAY][0m[92m[OKAY][0m
 

[93m[NO][0m fused_lamb.......fused_lamb   [92m[OKAY][0m..........................
  [93m[NO][0m[93m[NO][0m  sparse_attn..............   ............[92m[OKAY][0m[92m[OKAY][0m sparse_attn

[93m[NO][0m  ...................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0mtransformer
 ............sparse_attn transformer [93m[NO][0msparse_attn ............ ............ ....... ............[92m[OKAY][0m   [93m[NO][0m
[93m[NO][0m [93m[NO][0m ....... ....... .......[92m[OKAY][0mstochastic_transformer  
 [92m[OKAY][0m[92m[OKAY][0m.

 stochastic_transformertransformer[93m[NO][0m transformer  ........ ............  ............ [93m[NO][0m[92m[OKAY][0m 
[93m[NO][0m [93m[NO][0m .......  ..............[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m

stochastic_transformerstochastic_transformer  ..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninja--------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------


--------------------------------------------------DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja--------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
JIT compiled ops requires ninja


DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report
JIT compiled ops requires ninja
--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
ninjaninjaninjaninja    ...................................................... ..................  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m

--------------------------------------------------

------------------------------------------------------------------------------------------------------------------------------------------------------

op name
op name op name ................op name  ................ ................installed................    installedinstalledinstalled   ........    compatiblecompatible
--------------------------------------------------compatiblecompatible


------------------------------------------------------------------------------------------------------------------------------------------------------


cpu_adam ............... cpu_adam[92m[YES][0mcpu_adam cpu_adam  .....................  ...............  ...............[92m[YES][0m[92m[OKAY][0m[92m[YES][0m  [92m[YES][0m
 ...... ...... ...... [92m[OKAY][0m
[92m[OKAY][0m 
[92m[OKAY][0m
fused_adam ............. [93m[NO][0m fused_adam.......  fused_adamfused_adam............. [92m[OKAY][0m  
.............[93m[NO][0m.............   fused_lamb[93m[NO][0m .......[93m[NO][0m   ....................[92m[OKAY][0m  [93m[NO][0m.......
 [92m[OKAY][0m .......
[92m[OKAY][0mfused_lamb 
 [92m[OKAY][0mfused_lamb.............
  .............fused_lamb[93m[NO][0m   [93m[NO][0m....................   [93m[NO][0m[92m[OKAY][0m.......
 [92m[OKAY][0msparse_attn 
 ................... [92m[OKAY][0m 
[93m[NO][0m ....... sparse_attn[92m[OKAY][0m 
............ sparse_attn[93m[NO][0mtransformer   ...............................   [92m[OKAY][0m[93m[NO][0m[93m[NO][0m
 sparse_attn .......transformer.......    ............[92m[OKAY][0m............[92m[OKAY][0m 
 
[93m[NO][0m[93m[NO][0m transformer .......  .......stochastic_transformer............[92m[OKAY][0m  
[92m[OKAY][0m .
 [93m[NO][0m[93m[NO][0m  .......transformerstochastic_transformer.......    [92m[OKAY][0m............ [92m[OKAY][0m.

 [93m[NO][0m[93m[NO][0m  ..............stochastic_transformer   . [93m[NO][0m .......[92m[OKAY][0m [92m[OKAY][0m

[92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninja--------------------------------------------------
--------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

ninjaninjaninjaninja   .................. .................................... ..................  [92m[OKAY][0m[92m[OKAY][0m
 
[92m[OKAY][0m--------------------------------------------------[92m[OKAY][0m
--------------------------------------------------


--------------------------------------------------op name
op name-------------------------------------------------- op name ................
  ................installed................op name   installed installed ................ ......    compatiblecompatibleinstalledcompatible 


..----------------------------------------------------------------------------------------------------
-------------------------------------------------- 

compatible
--------------------------------------------------
cpu_adamcpu_adam cpu_adam ..............................   [92m[YES][0m...............[92m[YES][0m  cpu_adam ......[92m[YES][0m   ......[92m[OKAY][0m.....................  
 [92m[OKAY][0m[92m[YES][0m[92m[OKAY][0m

 ...... [92m[OKAY][0m
fused_adam fused_adam.............fused_adam   .............[93m[NO][0m.............   [93m[NO][0m.......[93m[NO][0m   .......[92m[OKAY][0m.......fused_adam
   [92m[OKAY][0m[92m[OKAY][0m.............

fused_lamb  [93m[NO][0m.............fused_lambfused_lamb  [93m[NO][0m  ....... ............. .................... [92m[OKAY][0m  [93m[NO][0m[92m[OKAY][0m[93m[NO][0m

  ..............  [92m[OKAY][0m[92m[OKAY][0m

fused_lamb ............. [93m[NO][0m .......sparse_attn  [92m[OKAY][0m............
sparse_attnsparse_attn   [93m[NO][0m........................   .......[93m[NO][0m [93m[NO][0m [92m[OKAY][0m .......
.......  [92m[OKAY][0m[92m[OKAY][0m
transformer
 ............ [93m[NO][0mtransformer sparse_attntransformer  ....... ........................   [92m[OKAY][0m............[93m[NO][0m[93m[NO][0m
   .............. [93m[NO][0m[92m[OKAY][0m  stochastic_transformer
[92m[OKAY][0m....... 
 . [92m[OKAY][0mstochastic_transformer[93m[NO][0mstochastic_transformer
   .........  transformer [92m[OKAY][0m [93m[NO][0m[93m[NO][0m
............  ..............   [93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m 

....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninja
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

DeepSpeed C++/CUDA extension op report--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report

------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja
--------------------------------------------------


JIT compiled ops requires ninjaJIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------
JIT compiled ops requires ninja
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


----------------------------------------------------------------------------------------------------
--------------------------------------------------
--------------------------------------------------
op nameop name
  op nameop name................................   ................ ................installedinstalled    installedinstalled....    ....compatiblecompatible 

 compatible----------------------------------------------------------------------------------------------------
compatible

--------------------------------------------------

--------------------------------------------------
cpu_adamcpu_adam cpu_adam...............   ..............................cpu_adam [92m[YES][0m  [92m[YES][0m ...............[92m[YES][0m ...... ......  ...... [92m[YES][0m[92m[OKAY][0m[92m[OKAY][0m  

......[92m[OKAY][0m 
[92m[OKAY][0m
fused_adam fused_adam.............  .............fused_adamfused_adam[93m[NO][0m   ............. [93m[NO][0m....... .............  [93m[NO][0m [92m[OKAY][0m
.......[93m[NO][0m   .......[92m[OKAY][0m....... fused_lamb 
 [92m[OKAY][0m[92m[OKAY][0m

............. [93m[NO][0mfused_lamb fused_lamb fused_lamb....... .............   ..........................[92m[OKAY][0m[93m[NO][0m 
  [93m[NO][0m.......[93m[NO][0m   .......[92m[OKAY][0m....... 
[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attnsparse_attntransformersparse_attn   ............ ........................ ............  [93m[NO][0m [93m[NO][0m[93m[NO][0m  [93m[NO][0m..............   .............. [92m[OKAY][0m  [92m[OKAY][0m
[92m[OKAY][0m
[92m[OKAY][0m

transformertransformertransformer   stochastic_transformer....................................   [93m[NO][0m [93m[NO][0m[93m[NO][0m  . .....................    [92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m

 
....... [92m[OKAY][0mstochastic_transformer
stochastic_transformerstochastic_transformer   .. .  [93m[NO][0m[93m[NO][0m[93m[NO][0m   .....................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


ninjaninjaninjaninja    ...................................................... .................. [92m[OKAY][0m [92m[OKAY][0m 
[92m[OKAY][0m
[92m[OKAY][0m
----------------------------------------------------------------------------------------------------

--------------------------------------------------
--------------------------------------------------
op nameop nameop name
   ................op name................................    installedinstalled................installed   .. installed....    compatiblecompatiblecompatible
..

-------------------------------------------------- --------------------------------------------------
--------------------------------------------------
compatible

--------------------------------------------------
cpu_adamcpu_adamcpu_adam cpu_adam...............    ............................................. [92m[YES][0m  [92m[YES][0m[92m[YES][0m  [92m[YES][0m ...... ............ ......  [92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m

[92m[OKAY][0m

fused_adam fused_adamfused_adam.............fused_adam   ............. .............[93m[NO][0m  .............[93m[NO][0m[93m[NO][0m   ....... [93m[NO][0m..............    [92m[OKAY][0m.......[92m[OKAY][0m
[92m[OKAY][0m
 
fused_lamb[92m[OKAY][0m 
fused_lambfused_lamb.............   ..........................fused_lamb[93m[NO][0m    [93m[NO][0m[93m[NO][0m....................   ....... [93m[NO][0m.......[92m[OKAY][0m   
[92m[OKAY][0m[92m[OKAY][0m.......

 [92m[OKAY][0m
sparse_attn ............sparse_attn  sparse_attnsparse_attn[93m[NO][0m............   ............  .......[93m[NO][0m[93m[NO][0m............   [92m[OKAY][0m.............. [93m[NO][0m
   [92m[OKAY][0m[92m[OKAY][0m
transformer.......
  transformer............[92m[OKAY][0m  
transformer............[93m[NO][0m transformer [93m[NO][0m............   .............. ............  [93m[NO][0m [92m[OKAY][0m.......
 [92m[OKAY][0m [93m[NO][0m
[92m[OKAY][0mstochastic_transformer 
.......  stochastic_transformer.[92m[OKAY][0m stochastic_transformer 
. [93m[NO][0m .[93m[NO][0m stochastic_transformer  ..............[93m[NO][0m    .[92m[OKAY][0m[92m[OKAY][0m....... 

 [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

------------------------------------------------------------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------


JIT compiled ops requires ninjaJIT compiled ops requires ninja--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ...............async_io  [93m[NO][0m ......................  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer quantizer..............  ..............[93m[NO][0m [93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
ninjaninjaninjaninja   .................................... .................. ..................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


op name
op nameop name   op name................................ ................ installed  ................ installed installed .. installed.. .. compatible  
compatible..
compatible---------------------------------------------------------------------------------------------------- 


compatible--------------------------------------------------

--------------------------------------------------
cpu_adamcpu_adam  ..............................  cpu_adam[92m[YES][0m[92m[YES][0mcpu_adam    ..................... ..................... [92m[OKAY][0m 
[92m[OKAY][0m [92m[YES][0m
[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

fused_adam .............fused_adam  [93m[NO][0m.............  ....... [93m[NO][0m[92m[OKAY][0mfused_adamfused_adam
   .................... ............. fused_lamb [93m[NO][0m[92m[OKAY][0m [93m[NO][0m
.............   .......[93m[NO][0m.......fused_lamb   ....... [92m[OKAY][0m [92m[OKAY][0m.............

[92m[OKAY][0m 
[93m[NO][0m fused_lambfused_lamb.......   ..........................[92m[OKAY][0m  
[93m[NO][0m[93m[NO][0m  sparse_attn..............   ............[92m[OKAY][0m [92m[OKAY][0m
[93m[NO][0m
 sparse_attn.......  ............[92m[OKAY][0m 
[93m[NO][0m ....... transformer[92m[OKAY][0m 
............ sparse_attn[93m[NO][0mtransformersparse_attn    ...............................  ............ [92m[OKAY][0m[93m[NO][0m [93m[NO][0m
 .......  [93m[NO][0m.......stochastic_transformer[92m[OKAY][0m   
.......[92m[OKAY][0m. 
 [92m[OKAY][0mstochastic_transformer[93m[NO][0mtransformer 
 . ....... ............ [93m[NO][0mtransformer [92m[OKAY][0m  [93m[NO][0m.......
............  [92m[OKAY][0m 
[93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
stochastic_transformer stochastic_transformer ..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------

DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------JIT compiled ops requires ninja----------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja


DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------DeepSpeed C++/CUDA extension op report


JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja   .................. ......................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------


op nameop nameop name op name  ................................ ................  ................installedinstalled    ..installedinstalled..  compatible  
....compatible-------------------------------------------------- 

 --------------------------------------------------compatiblecompatible


----------------------------------------------------------------------------------------------------

cpu_adam ...............cpu_adam [92m[YES][0mcpu_adam  cpu_adam ............... ...... ............... [92m[YES][0m...............  [92m[OKAY][0m ......[92m[YES][0m[92m[YES][0m
   [92m[OKAY][0m............
 [92m[OKAY][0m 
[92m[OKAY][0m
fused_adam ............. [93m[NO][0mfused_adam  ....................  [92m[OKAY][0mfused_adam[93m[NO][0m 
.............fused_adam   .......[93m[NO][0m.............   fused_lamb[92m[OKAY][0m[93m[NO][0m.......
   ....................fused_lamb[92m[OKAY][0m   [93m[NO][0m
[92m[OKAY][0m.............  
.......[93m[NO][0m  fused_lamb[92m[OKAY][0m.......fused_lamb
   .............[92m[OKAY][0m............. 
[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
sparse_attn[92m[OKAY][0m 
............ [93m[NO][0msparse_attn  ...................  [93m[NO][0m[92m[OKAY][0m 
....... sparse_attn[92m[OKAY][0mtransformer 
 sparse_attn........................transformer    [93m[NO][0m[93m[NO][0m........................   ....... .......[93m[NO][0m [93m[NO][0m[92m[OKAY][0m   
[92m[OKAY][0m..............
  [92m[OKAY][0mstochastic_transformertransformer
 [92m[OKAY][0m .
stochastic_transformer ............ [93m[NO][0m . transformer[93m[NO][0m  ....... [93m[NO][0m ............ ....... [92m[OKAY][0m....... [93m[NO][0m
[92m[OKAY][0m  
[92m[OKAY][0m.......
 [92m[OKAY][0mstochastic_transformer
 . [93m[NO][0mstochastic_transformer  ....... .[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
ninjaninjaninjaninja   .................................... ..................   ..................[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m

 --------------------------------------------------
[92m[OKAY][0m
----------------------------------------------------------------------------------------------------


op name --------------------------------------------------op nameop name
 ................  ................................op nameinstalled   installedinstalled   .... .................. compatible compatible 
compatible
installed
---------------------------------------------------------------------------------------------------- --------------------------------------------------


.. compatible
--------------------------------------------------
cpu_adamcpu_adamcpu_adam   .............................................cpu_adam    [92m[YES][0m[92m[YES][0m[92m[YES][0m...............   ...... ......[92m[YES][0m ......  [92m[OKAY][0m[92m[OKAY][0m 

......[92m[OKAY][0m 
[92m[OKAY][0m
fused_adam ............. [93m[NO][0mfused_adam  .......fused_adam............. fused_adam[92m[OKAY][0m  
 .............[93m[NO][0m.............   fused_lamb.......[93m[NO][0m [93m[NO][0m  ............. .......[92m[OKAY][0m ....... 
[93m[NO][0m [92m[OKAY][0m 
[92m[OKAY][0mfused_lamb.......
  fused_lamb[92m[OKAY][0m............. fused_lamb
 ............. [93m[NO][0m ............. [93m[NO][0m .......  [93m[NO][0m.......[92m[OKAY][0m  [92m[OKAY][0m
.......sparse_attn
  ............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
sparse_attntransformer sparse_attn ............ ............ sparse_attn [93m[NO][0m............[93m[NO][0m    ..........................[93m[NO][0m    [92m[OKAY][0m....... [92m[OKAY][0m[93m[NO][0m
 
[92m[OKAY][0m.......
transformer  stochastic_transformer transformer............[92m[OKAY][0m . 
 [93m[NO][0m[93m[NO][0m............  transformer .............. [93m[NO][0m ............[92m[OKAY][0m  [92m[OKAY][0m 

.......[93m[NO][0m  [92m[OKAY][0m.......
 stochastic_transformer[92m[OKAY][0m 
.stochastic_transformer  [93m[NO][0mstochastic_transformer.   ........[93m[NO][0m   [92m[OKAY][0m[93m[NO][0m.......
  .......[92m[OKAY][0m 
[92m[OKAY][0m
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


op nameop nameop nameop name    ................................................................   installed installed installedinstalled..   .. .. compatible..compatible 
 
compatible----------------------------------------------------------------------------------------------------compatible


--------------------------------------------------
--------------------------------------------------
cpu_adamcpu_adamcpu_adam   cpu_adam.............................................    [92m[YES][0m[92m[YES][0m[92m[YES][0m ...............  ...... ............  [92m[OKAY][0m[92m[YES][0m 
[92m[OKAY][0m [92m[OKAY][0m
......
 [92m[OKAY][0m
fused_adam ............. [93m[NO][0m fused_adamfused_adam....... fused_adam.............    .............[92m[OKAY][0m[93m[NO][0m.............
   .......[93m[NO][0m[93m[NO][0m fused_lamb  [92m[OKAY][0m ..............
.............   [92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m
fused_lamb 
 ....................fused_lamb  fused_lamb [92m[OKAY][0m[93m[NO][0m............. 
  .................... [93m[NO][0m  [93m[NO][0m[92m[OKAY][0m....... 
 .......[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... sparse_attn[92m[OKAY][0m 
............ [93m[NO][0mtransformersparse_attn sparse_attn ....... ............   ............[92m[OKAY][0m............[93m[NO][0m 
 [93m[NO][0m [93m[NO][0m transformer.......  ....... ...................  [92m[OKAY][0m[92m[OKAY][0m 
[93m[NO][0m
[92m[OKAY][0m 
transformer.......  stochastic_transformertransformer............[92m[OKAY][0m  
 .............[93m[NO][0m   [93m[NO][0mstochastic_transformer[93m[NO][0m.......    ............... [92m[OKAY][0m  [93m[NO][0m[92m[OKAY][0m
 [92m[OKAY][0m
.......
stochastic_transformer [92m[OKAY][0m 
. stochastic_transformer[93m[NO][0m  ........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report


--------------------------------------------------DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja--------------------------------------------------


JIT compiled ops requires ninja--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report

--------------------------------------------------DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report


------------------------------------------------------------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
DeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------
--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja


----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

ninjaninjaninjaninja    ......................................................  ..................[92m[OKAY][0m [92m[OKAY][0m 
[92m[OKAY][0m
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
----------------------------------------------------------------------------------------------------

op name
 op nameop nameop name................  ................  ................ ................installed installedinstalled   .. ..installed  ..compatible compatible 

..compatible---------------------------------------------------------------------------------------------------- 


compatible--------------------------------------------------

--------------------------------------------------
cpu_adamcpu_adam  cpu_adam..............................   [92m[YES][0m[92m[YES][0mcpu_adam   ........................... ...............  [92m[OKAY][0m[92m[OKAY][0m [92m[YES][0m

 [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
fused_adamfused_adam  ..........................  [93m[NO][0m[93m[NO][0m fused_adam fused_adam....... .......  .............[92m[OKAY][0m ............. 
[92m[OKAY][0m [93m[NO][0m
[93m[NO][0mfused_lamb fused_lamb  .............. ..........................   [92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m 
[93m[NO][0m 
 .............. fused_lamb [92m[OKAY][0m[92m[OKAY][0m
 fused_lamb
.............  .............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attnsparse_attn  ........................  [93m[NO][0m[93m[NO][0m  ....... .......[92m[OKAY][0m sparse_attn
[92m[OKAY][0msparse_attn 
 ............transformertransformer............    ............[93m[NO][0m[93m[NO][0m............   [93m[NO][0m .............. [93m[NO][0m .......  [92m[OKAY][0m .......[92m[OKAY][0m[92m[OKAY][0m 


[92m[OKAY][0mtransformer
 transformerstochastic_transformer............  stochastic_transformer .[93m[NO][0m............    [93m[NO][0m........[93m[NO][0m   ....... [93m[NO][0m[92m[OKAY][0m .......[92m[OKAY][0m  

.......[92m[OKAY][0m 
[92m[OKAY][0mstochastic_transformer
 .stochastic_transformer  [93m[NO][0m ........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ....... [92m[OKAY][0m

utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizerasync_io  .............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[93m[NO][0m

--------------------------------------------------
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


op nameop nameop name op name ................  ................................................    installedinstalledinstalled installed   ...... .. compatible  compatible
compatiblecompatible

--------------------------------------------------
--------------------------------------------------
----------------------------------------------------------------------------------------------------


cpu_adam ...............cpu_adamcpu_adam cpu_adam  [92m[YES][0m .............................. ...............  ...... [92m[YES][0m[92m[YES][0m[92m[YES][0m   [92m[OKAY][0m ............
 ...... [92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0m
fused_adam fused_adamfused_adamfused_adam.............  .............  [93m[NO][0m ............. [93m[NO][0m ....... .............[93m[NO][0m ....... [92m[OKAY][0m [93m[NO][0m
  .......[92m[OKAY][0m 
fused_lamb[92m[OKAY][0m .......
fused_lamb ............. fused_lamb .............[92m[OKAY][0m[93m[NO][0m 
  [93m[NO][0m....................   [92m[OKAY][0m.......[93m[NO][0m
fused_lamb   .......[92m[OKAY][0m.............  
[92m[OKAY][0m
[93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m sparse_attn.......sparse_attn  [92m[OKAY][0m ............
............ [93m[NO][0m transformer [93m[NO][0m ....... ............ ....... [92m[OKAY][0m[93m[NO][0msparse_attn
   transformer............ .......[92m[OKAY][0m ............ 
[93m[NO][0m[92m[OKAY][0m 
[93m[NO][0m transformer  stochastic_transformer...................   .......[93m[NO][0m[92m[OKAY][0m.  [92m[OKAY][0m 
.......
[93m[NO][0m  stochastic_transformertransformer  [92m[OKAY][0m.......
 [92m[OKAY][0m.............
  stochastic_transformer[93m[NO][0m[93m[NO][0m   ........  [93m[NO][0m[92m[OKAY][0m .......
....... [92m[OKAY][0m 
[92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


----------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------op name
op name 
 ................op nameop name................    ................installed................installed   .. installed  ..installedcompatible.. 
  compatible--------------------------------------------------..compatible

 
--------------------------------------------------compatible--------------------------------------------------


--------------------------------------------------
cpu_adam ............... cpu_adam[92m[YES][0mcpu_adam  ...............cpu_adam ......  ............... [92m[YES][0m...............[92m[OKAY][0m   ......
[92m[YES][0m[92m[YES][0m   [92m[OKAY][0m......
......  [92m[OKAY][0mfused_adam[92m[OKAY][0m
 
............. [93m[NO][0m .......fused_adam  [92m[OKAY][0m.............
 fused_adam[93m[NO][0mfused_lambfused_adam    ................................. .............  [92m[OKAY][0m[93m[NO][0m  
[93m[NO][0m.......[93m[NO][0mfused_lamb    ....................[92m[OKAY][0m.......  
 [93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m 

....... [92m[OKAY][0m
fused_lambfused_lamb sparse_attn .............  .........................[93m[NO][0m   [93m[NO][0msparse_attn[93m[NO][0m.......   ....... ...................[92m[OKAY][0m   
[92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m

 .......transformer  [92m[OKAY][0m............
 [93m[NO][0m transformer.......sparse_attn   [92m[OKAY][0m............
 ............sparse_attn[93m[NO][0m   [93m[NO][0m...................stochastic_transformer    [92m[OKAY][0m........
[93m[NO][0m   [93m[NO][0m[92m[OKAY][0m....... 
 stochastic_transformer.......  [92m[OKAY][0mtransformer[92m[OKAY][0m.
 
 ............[93m[NO][0mtransformer  .......[93m[NO][0m   ............[92m[OKAY][0m....... 
 [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
stochastic_transformer . stochastic_transformer[93m[NO][0m  ........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninjaninja   ......................................................    ..................[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0m
------------------------------------------------------------------------------------------------------------------------------------------------------


--------------------------------------------------op nameop name
  op name................................ op name  ................installedinstalled   installed ....................   compatible ..installed 
compatible 
compatible--------------------------------------------------..
--------------------------------------------------
 --------------------------------------------------
compatible

--------------------------------------------------
cpu_adam cpu_adam...............cpu_adam   cpu_adam...............[92m[YES][0m  ............... .....................[92m[YES][0m    [92m[OKAY][0m[92m[YES][0m[92m[YES][0m
 ...... ...... ...... [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

fused_adam ............. [93m[NO][0m .......fused_adam fused_adam fused_adam.............[92m[OKAY][0m   .............[93m[NO][0m
 ............. [93m[NO][0m fused_lamb ....... [93m[NO][0m....................    [92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m
 ..............
  fused_lamb[92m[OKAY][0m[92m[OKAY][0m fused_lamb

.............  .............[93m[NO][0m fused_lamb [93m[NO][0m ....... ............. ....... [92m[OKAY][0m [93m[NO][0m
[92m[OKAY][0m sparse_attn
.......  ............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............sparse_attntransformer   [93m[NO][0msparse_attn............  ................... ............  [92m[OKAY][0m[93m[NO][0m
 [93m[NO][0m [93m[NO][0mtransformer.......   [92m[OKAY][0m ................... .......
 [92m[OKAY][0m [93m[NO][0m
[92m[OKAY][0m 
.......stochastic_transformertransformer transformer   .[92m[OKAY][0m........................ 
  [93m[NO][0m[93m[NO][0m[93m[NO][0m   .......stochastic_transformer..............    [92m[OKAY][0m[92m[OKAY][0m.[92m[OKAY][0m

 
[93m[NO][0m ....... [92m[OKAY][0mstochastic_transformerstochastic_transformer
  ..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference transformer_inference..  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer ..............quantizer  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference ..async_io [93m[NO][0m  ......................  [93m[NO][0m[92m[OKAY][0m 
....... [93m[NO][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0mtransformer_inference
 .. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
utils --------------------------------------------------..................
 [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference ..utils  [93m[NO][0m..................  .......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizerutils  ................................  [93m[NO][0m[92m[YES][0m  .............  [92m[OKAY][0m[92m[OKAY][0m

quantizer --------------------------------------------------..............
 [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.async_io
 ............... [93m[NO][0m ....... [93m[NO][0m
async_iotransformer_inference  .................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[93m[NO][0m

utils .................. [92m[YES][0m transformer_inference......  ..[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0mquantizer
 .............. [93m[NO][0m ....... [92m[OKAY][0mutils
 .................. [92m[YES][0m --------------------------------------------------......
 [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
async_io ...............async_io  [93m[NO][0m...............  .......[93m[NO][0m  [93m[NO][0m.......
 [93m[NO][0m
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
transformer_inference .. [93m[NO][0mtransformer_inference  .........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
utils utils.................. ..................  [92m[YES][0m[92m[YES][0m  ...... [92m[OKAY][0m
......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m ....... [92m[OKAY][0m
quantizer ..............-------------------------------------------------- 
[93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io...............  [93m[NO][0m...............  .......[93m[NO][0m  [93m[NO][0m.......
 [93m[NO][0m
transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0masync_io .......  ...............[93m[NO][0m 
[93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m .......transformer_inference  [92m[OKAY][0m..
 [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0mutils  ........................  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0mquantizer
 .............. [93m[NO][0m quantizer.......  [92m[OKAY][0m..............
 [93m[NO][0m .......-------------------------------------------------- 
[92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ...............async_io [93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0masync_io .......  ...............[93m[NO][0m [93m[NO][0m ....... 
[93m[NO][0m
transformer_inference .. [93m[NO][0m transformer_inference.......  ..[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0mutils
 .................. [92m[YES][0mquantizer  .............. [93m[NO][0m .............  [92m[OKAY][0m[92m[OKAY][0m

quantizer --------------------------------------------------..............
 [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io async_io...............  [93m[NO][0m...............  .......[93m[NO][0m  [93m[NO][0m.......
 [93m[NO][0m
transformer_inference .. [93m[NO][0mtransformer_inference  .........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
DeepSpeed general environment info:
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils .................. [92m[YES][0mutils  ........................  [92m[OKAY][0m
[92m[YES][0m ...... [92m[OKAY][0mquantizer
 .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.1
1.8.1
torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info
 ...................deepspeed info  0.4.2+72ce55a, 72ce55a, big-science...................
 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w.
 ......deepspeed wheel compiled w.  torch 1.8, cuda 11.1......
 torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path torch install path...............  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version ....................torch version  1.8.1....................
 1.8.1torch cuda version
 ...............torch cuda version  11.1...............
 nvcc version11.1 
.....................nvcc version  11.2.....................
 deepspeed install path11.2 
...........deepspeed install path  ...........['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ...................0.4.2+72ce55a, 72ce55a, big-science 
0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch install path torch version...............  .................... 1.8.1
torch cuda version ...............['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 
11.1
nvcc versiontorch version  .........................................  11.21.8.1

deepspeed install pathtorch cuda version  ..........................  11.1['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

nvcc versiondeepspeed info  ........................................  11.20.4.2+72ce55a, 72ce55a, big-science

deepspeed install pathdeepspeed wheel compiled w.  .................  torch 1.8, cuda 11.1['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version DeepSpeed general environment info:..................... 11.2
deepspeed install path
 ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info torch install path...................  0.4.2+72ce55a, 72ce55a, big-science...............
deepspeed wheel compiled w.  ...... torch 1.8, cuda 11.1
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info ...................  ...................0.4.2+72ce55a, 72ce55a, big-science 
0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch version torch install path....................  1.8.1...............
 torch cuda version ............... 11.1
nvcc version['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 
..................... 11.2
torch versiondeepspeed install path  ...............................  1.8.1
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
torch cuda versiondeepspeed info  ..................................  11.10.4.2+72ce55a, 72ce55a, big-science

nvcc versiondeepspeed wheel compiled w.  ...........................  11.2torch 1.8, cuda 11.1

deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path torch install path...............  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:DeepSpeed general environment info:

torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']torch version
 .................... 1.8.1torch version
 torch cuda version ............... 11.1
....................nvcc version  1.8.1.....................
 11.2
torch cuda versiondeepspeed install path  ..........................  11.1
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']nvcc version
 deepspeed info.....................  ...................11.2 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w.
 deepspeed install path......  ...........torch 1.8, cuda 11.1 
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path DeepSpeed general environment info:............... 
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch install path ............... torch version .................... 1.8.1
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']torch cuda version
 ............... torch version11.1 
....................nvcc version  1.8.1.....................
 11.2
torch cuda version deepspeed install path...............  ...........11.1 
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']nvcc version
 deepspeed info.....................  ...................11.2 
0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed install path
 deepspeed wheel compiled w............  ...... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']torch 1.8, cuda 11.1

deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
using world size: 64, data-parallel-size: 4, tensor-model-parallel size: 4, pipeline-model-parallel size: 4 
using torch.float16 for parameters ...
------------------------ arguments ------------------------
  accumulate_allreduce_grads_in_fp32 .............. False
  adam_beta1 ...................................... 0.9
  adam_beta2 ...................................... 0.999
  adam_eps ........................................ 1e-08
  adlr_autoresume ................................. False
  adlr_autoresume_interval ........................ 1000
  apply_query_key_layer_scaling ................... True
  apply_residual_connection_post_layernorm ........ False
  attention_dropout ............................... 0.1
  attention_softmax_in_fp32 ....................... False
  bert_binary_head ................................ True
  bert_load ....................................... None
  bf16 ............................................ False
  bias_dropout_fusion ............................. True
  bias_gelu_fusion ................................ True
  biencoder_projection_dim ........................ 0
  biencoder_shared_query_context_model ............ False
  block_data_path ................................. None
  checkpoint_activations .......................... True
  checkpoint_in_cpu ............................... False
  checkpoint_num_layers ........................... 1
  clip_grad ....................................... 1.0
  codecarbon_dir .................................. None
  consumed_train_samples .......................... 0
  consumed_valid_samples .......................... 0
  contigious_checkpointing ........................ False
  cpu_optimizer ................................... False
  cpu_torch_adam .................................. False
  data_impl ....................................... mmap
  data_parallel_size .............................. 4
  data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document']
  dataloader_type ................................. single
  DDP_impl ........................................ local
  decoder_seq_length .............................. None
  deepscale ....................................... False
  deepscale_config ................................ None
  deepspeed ....................................... True
  deepspeed_activation_checkpointing .............. True
  deepspeed_config ................................ ./ds_config.1309839.json
  deepspeed_mpi ................................... False
  distribute_checkpointed_activations ............. False
  distributed_backend ............................. nccl
  embedding_path .................................. None
  encoder_seq_length .............................. 2048
  eod_mask_loss ................................... False
  eval_interval ................................... 1000
  eval_iters ...................................... 100
  evidence_data_path .............................. None
  exit_duration_in_mins ........................... 1190
  exit_interval ................................... None
  ffn_hidden_size ................................. 8192
  finetune ........................................ False
  fp16 ............................................ True
  fp16_lm_cross_entropy ........................... False
  fp32_residual_connection ........................ False
  gigaflos_no_embeds .............................. 0
  global_batch_size ............................... 512
  glu_activation .................................. None
  hidden_dropout .................................. 0.1
  hidden_size ..................................... 2048
  hysteresis ...................................... 2
  ict_head_size ................................... None
  ict_load ........................................ None
  img_dim ......................................... 224
  indexer_batch_size .............................. 128
  indexer_log_interval ............................ 1000
  init_method_std ................................. 0.02
  init_method_xavier_uniform ...................... False
  initial_loss_scale .............................. 4294967296
  kv_channels ..................................... 128
  layernorm_epsilon ............................... 1e-05
  lazy_mpu_init ................................... None
  load ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
  local_rank ...................................... 0
  log_batch_size_to_tensorboard ................... True
  log_interval .................................... 200
  log_learning_rate_to_tensorboard ................ True
  log_loss_scale_to_tensorboard ................... True
  log_num_zeros_in_grad ........................... False
  log_params_norm ................................. False
  log_timers_to_tensorboard ....................... True
  log_validation_ppl_to_tensorboard ............... True
  loss_on_targets_only ............................ False
  loss_scale ...................................... 12.0
  loss_scale_window ............................... 1000
  lr .............................................. 0.0002
  lr_decay_iters .................................. None
  lr_decay_samples ................................ 73242187
  lr_decay_style .................................. cosine
  lr_warmup_fraction .............................. None
  lr_warmup_iters ................................. 0
  lr_warmup_samples ............................... 183105
  make_vocab_size_divisible_by .................... 128
  mask_prob ....................................... 0.15
  masked_softmax_fusion ........................... True
  max_position_embeddings ......................... None
  memory_centric_tiled_linear ..................... False
  merge_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-merges.txt
  micro_batch_size ................................ 8
  min_loss_scale .................................. 1.0
  min_lr .......................................... 1e-05
  mmap_warmup ..................................... False
  no_load_optim ................................... None
  no_load_rng ..................................... None
  no_save_optim ................................... None
  no_save_rng ..................................... None
  no_train ........................................ None
  num_attention_heads ............................. 16
  num_channels .................................... 3
  num_classes ..................................... 1000
  num_layers ...................................... 24
  num_layers_per_virtual_pipeline_stage ........... None
  num_workers ..................................... 2
  onnx_safe ....................................... None
  openai_gelu ..................................... False
  optimizer ....................................... adam
  override_lr_scheduler ........................... False
  params_dtype .................................... torch.float16
  partition_activations ........................... False
  patch_dim ....................................... 16
  pipeline_model_parallel_size .................... 4
  position_embedding_type ......................... PositionEmbeddingType.rotary
  profile_backward ................................ False
  query_in_block_prob ............................. 0.1
  rampup_batch_size ............................... ['32', '32', '2_000_000']
  rank ............................................ 0
  remote_device ................................... none
  reset_attention_mask ............................ False
  reset_position_ids .............................. False
  retriever_report_topk_accuracies ................ []
  retriever_score_scaling ......................... False
  retriever_seq_length ............................ 256
  sample_rate ..................................... 1.0
  save ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
  save_interval ................................... 1500
  scatter_gather_tensors_in_pipeline .............. True
  scattered_embeddings ............................ False
  seed ............................................ 1234
  seq_length ...................................... 2048
  sgd_momentum .................................... 0.9
  short_seq_prob .................................. 0.1
  split ........................................... 949,50,1
  split_transformers .............................. False
  synchronize_each_layer .......................... False
  tensor_model_parallel_size ...................... 4
  tensorboard_dir ................................. /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/tr4c-1B3-rotary-oscar-logs/tensorboard
  tensorboard_log_interval ........................ 1
  tensorboard_queue_size .......................... 5
  tile_factor ..................................... 1
  titles_data_path ................................ None
  tokenizer_name_or_path .......................... None
  tokenizer_type .................................. GPT2BPETokenizer
  train_iters ..................................... None
  train_samples ................................... 73242187
  use_checkpoint_lr_scheduler ..................... False
  use_contiguous_buffers_in_ddp ................... False
  use_cpu_initialization .......................... None
  use_one_sent_docs ............................... False
  use_pin_memory .................................. False
  virtual_pipeline_model_parallel_size ............ None
  vocab_extra_ids ................................. 0
  vocab_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-vocab.json
  weight_decay .................................... 0.1
  world_size ...................................... 64
  zero_allgather_bucket_size ...................... 0.0
  zero_contigious_gradients ....................... False
  zero_reduce_bucket_size ......................... 0.0
  zero_reduce_scatter ............................. False
  zero_stage ...................................... 1
-------------------- end of arguments ---------------------
will use batch size rampup starting from global batch size 32 to global batch size 512 with batch size increments 32 over 2000000 samples.
> building GPT2BPETokenizer tokenizer ...
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
 > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688)
> initializing torch distributed ...
> initializing tensor model parallel with size 4
> initializing pipeline model parallel with size 4
> setting random seeds to 1234 ...
[2021-09-30 06:29:25,117] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
> compiling dataset index builder ...
make: Entering directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data'
make: Nothing to be done for 'default'.
make: Leaving directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data'
>>> done with dataset index builder. Compilation time: 0.326 seconds
> compiling and loading fused kernels ...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_upper_triang_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_upper_triang_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module fused_mix_prec_layer_norm_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_mix_prec_layer_norm_cuda...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
>>> done with compiling and loading fused kernels. Compilation time: 21.369 seconds
time to initialize megatron (seconds): 58.859
[after megatron is initialized] datetime: 2021-09-30 06:29:46 
building GPT model ...
[2021-09-30 06:29:46,974] [INFO] [utils.py:680:see_memory_usage] Before Building Model
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved
  warnings.warn(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved
  warnings.warn(
[2021-09-30 06:29:46,976] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB         Max_MA 0.0 GB         CA 0.0 GB         Max_CA 0 GB 
[2021-09-30 06:29:46,976] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory:  used = 39.92 GB, percent = 21.3%
SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None
Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=1, data=0, model=0): 16, ProcessCoord(pipe=1, data=0, model=1): 17, ProcessCoord(pipe=1, data=0, model=2): 18, ProcessCoord(pipe=1, data=0, model=3): 19, ProcessCoord(pipe=1, data=1, model=0): 20, ProcessCoord(pipe=1, data=1, model=1): 21, ProcessCoord(pipe=1, data=1, model=2): 22, ProcessCoord(pipe=1, data=1, model=3): 23, ProcessCoord(pipe=1, data=2, model=0): 24, ProcessCoord(pipe=1, data=2, model=1): 25, ProcessCoord(pipe=1, data=2, model=2): 26, ProcessCoord(pipe=1, data=2, model=3): 27, ProcessCoord(pipe=1, data=3, model=0): 28, ProcessCoord(pipe=1, data=3, model=1): 29, ProcessCoord(pipe=1, data=3, model=2): 30, ProcessCoord(pipe=1, data=3, model=3): 31, ProcessCoord(pipe=2, data=0, model=0): 32, ProcessCoord(pipe=2, data=0, model=1): 33, ProcessCoord(pipe=2, data=0, model=2): 34, ProcessCoord(pipe=2, data=0, model=3): 35, ProcessCoord(pipe=2, data=1, model=0): 36, ProcessCoord(pipe=2, data=1, model=1): 37, ProcessCoord(pipe=2, data=1, model=2): 38, ProcessCoord(pipe=2, data=1, model=3): 39, ProcessCoord(pipe=2, data=2, model=0): 40, ProcessCoord(pipe=2, data=2, model=1): 41, ProcessCoord(pipe=2, data=2, model=2): 42, ProcessCoord(pipe=2, data=2, model=3): 43, ProcessCoord(pipe=2, data=3, model=0): 44, ProcessCoord(pipe=2, data=3, model=1): 45, ProcessCoord(pipe=2, data=3, model=2): 46, ProcessCoord(pipe=2, data=3, model=3): 47, ProcessCoord(pipe=3, data=0, model=0): 48, ProcessCoord(pipe=3, data=0, model=1): 49, ProcessCoord(pipe=3, data=0, model=2): 50, ProcessCoord(pipe=3, data=0, model=3): 51, ProcessCoord(pipe=3, data=1, model=0): 52, ProcessCoord(pipe=3, data=1, model=1): 53, ProcessCoord(pipe=3, data=1, model=2): 54, ProcessCoord(pipe=3, data=1, model=3): 55, ProcessCoord(pipe=3, data=2, model=0): 56, ProcessCoord(pipe=3, data=2, model=1): 57, ProcessCoord(pipe=3, data=2, model=2): 58, ProcessCoord(pipe=3, data=2, model=3): 59, ProcessCoord(pipe=3, data=3, model=0): 60, ProcessCoord(pipe=3, data=3, model=1): 61, ProcessCoord(pipe=3, data=3, model=2): 62, ProcessCoord(pipe=3, data=3, model=3): 63}
[2021-09-30 06:29:47,500] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer
stage=0 layers=9
     0: _to_float16
     1: EmbeddingPipe
     2: <lambda>
     3: ParallelTransformerLayerPipe
     4: ParallelTransformerLayerPipe
     5: ParallelTransformerLayerPipe
     6: ParallelTransformerLayerPipe
     7: ParallelTransformerLayerPipe
     8: ParallelTransformerLayerPipe
stage=1 layers=6
     9: ParallelTransformerLayerPipe
    10: ParallelTransformerLayerPipe
    11: ParallelTransformerLayerPipe
    12: ParallelTransformerLayerPipe
    13: ParallelTransformerLayerPipe
    14: ParallelTransformerLayerPipe
stage=2 layers=6
    15: ParallelTransformerLayerPipe
    16: ParallelTransformerLayerPipe
    17: ParallelTransformerLayerPipe
    18: ParallelTransformerLayerPipe
    19: ParallelTransformerLayerPipe
    20: ParallelTransformerLayerPipe
stage=3 layers=10
    21: ParallelTransformerLayerPipe
    22: ParallelTransformerLayerPipe
    23: ParallelTransformerLayerPipe
    24: ParallelTransformerLayerPipe
    25: ParallelTransformerLayerPipe
    26: ParallelTransformerLayerPipe
    27: <lambda>
    28: MixedFusedLayerNorm
    29: EmbeddingPipe
    30: float16_to_fp32
  loss: CrossEntropy
 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 101549056

 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 101549056
 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 101549056
 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 101544960 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 101544960 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 101544960


[2021-09-30 06:29:47,889] [INFO] [utils.py:680:see_memory_usage] After Building Model
[2021-09-30 06:29:47,889] [INFO] [utils.py:681:see_memory_usage] MA 0.2 GB         Max_MA 0.22 GB         CA 0.24 GB         Max_CA 0 GB 
[2021-09-30 06:29:47,890] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory:  used = 40.31 GB, percent = 21.5%
 > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 101544960
setting training iterations to 152972
> learning rate decay style: cosine
DeepSpeed is enabled.
[2021-09-30 06:29:47,910] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+72ce55a, git-hash=72ce55a, git-branch=big-science
[2021-09-30 06:29:47,984] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False
[2021-09-30 06:29:47,984] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer
[2021-09-30 06:29:47,984] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer
[2021-09-30 06:29:47,985] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam
[2021-09-30 06:29:47,985] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type=<class 'apex.optimizers.fused_adam.FusedAdam'>
[2021-09-30 06:29:47,985] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer
[2021-09-30 06:29:47,985] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000
[2021-09-30 06:29:47,985] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000
[2021-09-30 06:29:47,985] [INFO] [stage2.py:108:__init__] CPU Offload: False
[2021-09-30 06:29:47,985] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False
[2021-09-30 06:29:48,233] [INFO] [stage2.py:419:__init__] optimizer state initialized
[2021-09-30 06:29:48,233] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam
[2021-09-30 06:29:48,233] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler
[2021-09-30 06:29:48,233] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = <megatron.learning_rates.AnnealingLR object at 0x149f301fce20>
[2021-09-30 06:29:48,233] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)]
[2021-09-30 06:29:48,233] [INFO] [config.py:900:print] DeepSpeedEngine configuration:
[2021-09-30 06:29:48,234] [INFO] [config.py:904:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2021-09-30 06:29:48,234] [INFO] [config.py:904:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2021-09-30 06:29:48,234] [INFO] [config.py:904:print]   allreduce_always_fp32 ........ False
[2021-09-30 06:29:48,234] [INFO] [config.py:904:print]   amp_enabled .................. False
[2021-09-30 06:29:48,234] [INFO] [config.py:904:print]   amp_params ................... False
[2021-09-30 06:29:48,234] [INFO] [config.py:904:print]   checkpoint_tag_validation_enabled  True
[2021-09-30 06:29:48,234] [INFO] [config.py:904:print]   checkpoint_tag_validation_fail  False
[2021-09-30 06:29:48,234] [INFO] [config.py:904:print]   disable_allgather ............ False
[2021-09-30 06:29:48,234] [INFO] [config.py:904:print]   dump_state ................... False
[2021-09-30 06:29:48,234] [INFO] [config.py:904:print]   dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1}
[2021-09-30 06:29:48,234] [INFO] [config.py:904:print]   eigenvalue_enabled ........... False
[2021-09-30 06:29:48,234] [INFO] [config.py:904:print]   eigenvalue_gas_boundary_resolution  1
[2021-09-30 06:29:48,234] [INFO] [config.py:904:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2021-09-30 06:29:48,234] [INFO] [config.py:904:print]   eigenvalue_layer_num ......... 0
[2021-09-30 06:29:48,234] [INFO] [config.py:904:print]   eigenvalue_max_iter .......... 100
[2021-09-30 06:29:48,234] [INFO] [config.py:904:print]   eigenvalue_stability ......... 1e-06
[2021-09-30 06:29:48,234] [INFO] [config.py:904:print]   eigenvalue_tol ............... 0.01
[2021-09-30 06:29:48,234] [INFO] [config.py:904:print]   eigenvalue_verbose ........... False
[2021-09-30 06:29:48,234] [INFO] [config.py:904:print]   elasticity_enabled ........... False
[2021-09-30 06:29:48,234] [INFO] [config.py:904:print]   flops_profiler_config ........ {
    "enabled": false, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2021-09-30 06:29:48,234] [INFO] [config.py:904:print]   fp16_enabled ................. True
[2021-09-30 06:29:48,234] [INFO] [config.py:904:print]   fp16_mixed_quantize .......... False
[2021-09-30 06:29:48,234] [INFO] [config.py:904:print]   global_rank .................. 0
[2021-09-30 06:29:48,234] [INFO] [config.py:904:print]   gradient_accumulation_steps .. 16
[2021-09-30 06:29:48,234] [INFO] [config.py:904:print]   gradient_clipping ............ 1.0
[2021-09-30 06:29:48,234] [INFO] [config.py:904:print]   gradient_predivide_factor .... 1.0
[2021-09-30 06:29:48,234] [INFO] [config.py:904:print]   initial_dynamic_scale ........ 4096
[2021-09-30 06:29:48,234] [INFO] [config.py:904:print]   loss_scale ................... 0
[2021-09-30 06:29:48,234] [INFO] [config.py:904:print]   memory_breakdown ............. False
[2021-09-30 06:29:48,234] [INFO] [config.py:904:print]   optimizer_legacy_fusion ...... False
[2021-09-30 06:29:48,234] [INFO] [config.py:904:print]   optimizer_name ............... None
[2021-09-30 06:29:48,235] [INFO] [config.py:904:print]   optimizer_params ............. None
[2021-09-30 06:29:48,235] [INFO] [config.py:904:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2021-09-30 06:29:48,235] [INFO] [config.py:904:print]   pld_enabled .................. False
[2021-09-30 06:29:48,235] [INFO] [config.py:904:print]   pld_params ................... False
[2021-09-30 06:29:48,235] [INFO] [config.py:904:print]   prescale_gradients ........... False
[2021-09-30 06:29:48,235] [INFO] [config.py:904:print]   quantize_change_rate ......... 0.001
[2021-09-30 06:29:48,235] [INFO] [config.py:904:print]   quantize_groups .............. 1
[2021-09-30 06:29:48,235] [INFO] [config.py:904:print]   quantize_offset .............. 1000
[2021-09-30 06:29:48,235] [INFO] [config.py:904:print]   quantize_period .............. 1000
[2021-09-30 06:29:48,235] [INFO] [config.py:904:print]   quantize_rounding ............ 0
[2021-09-30 06:29:48,235] [INFO] [config.py:904:print]   quantize_start_bits .......... 16
[2021-09-30 06:29:48,235] [INFO] [config.py:904:print]   quantize_target_bits ......... 8
[2021-09-30 06:29:48,235] [INFO] [config.py:904:print]   quantize_training_enabled .... False
[2021-09-30 06:29:48,235] [INFO] [config.py:904:print]   quantize_type ................ 0
[2021-09-30 06:29:48,235] [INFO] [config.py:904:print]   quantize_verbose ............. False
[2021-09-30 06:29:48,235] [INFO] [config.py:904:print]   scheduler_name ............... None
[2021-09-30 06:29:48,235] [INFO] [config.py:904:print]   scheduler_params ............. None
[2021-09-30 06:29:48,235] [INFO] [config.py:904:print]   sparse_attention ............. None
[2021-09-30 06:29:48,235] [INFO] [config.py:904:print]   sparse_gradients_enabled ..... False
[2021-09-30 06:29:48,235] [INFO] [config.py:904:print]   steps_per_print .............. 2000
[2021-09-30 06:29:48,235] [INFO] [config.py:904:print]   tensorboard_enabled .......... False
[2021-09-30 06:29:48,235] [INFO] [config.py:904:print]   tensorboard_job_name ......... DeepSpeedJobName
[2021-09-30 06:29:48,235] [INFO] [config.py:904:print]   tensorboard_output_path ...... 
[2021-09-30 06:29:48,235] [INFO] [config.py:904:print]   train_batch_size ............. 512
[2021-09-30 06:29:48,235] [INFO] [config.py:904:print]   train_micro_batch_size_per_gpu  8
[2021-09-30 06:29:48,235] [INFO] [config.py:904:print]   use_quantizer_kernel ......... False
[2021-09-30 06:29:48,235] [INFO] [config.py:904:print]   wall_clock_breakdown ......... False
[2021-09-30 06:29:48,235] [INFO] [config.py:904:print]   world_size ................... 4
[2021-09-30 06:29:48,235] [INFO] [config.py:904:print]   zero_allow_untested_optimizer  False
[2021-09-30 06:29:48,235] [INFO] [config.py:904:print]   zero_config .................. {
    "stage": 1, 
    "contiguous_gradients": false, 
    "reduce_scatter": true, 
    "reduce_bucket_size": 5.000000e+08, 
    "allgather_partitions": true, 
    "allgather_bucket_size": 5.000000e+08, 
    "overlap_comm": false, 
    "load_from_fp32_weights": true, 
    "elastic_checkpoint": true, 
    "offload_param": null, 
    "offload_optimizer": null, 
    "sub_group_size": 1.000000e+09, 
    "prefetch_bucket_size": 5.000000e+07, 
    "param_persistence_threshold": 1.000000e+05, 
    "max_live_parameters": 1.000000e+09, 
    "max_reuse_distance": 1.000000e+09, 
    "gather_fp16_weights_on_model_save": false, 
    "ignore_unused_parameters": true, 
    "round_robin_gradients": false, 
    "legacy_stage1": false
}
[2021-09-30 06:29:48,235] [INFO] [config.py:904:print]   zero_enabled ................. True
[2021-09-30 06:29:48,235] [INFO] [config.py:904:print]   zero_optimization_stage ...... 1
[2021-09-30 06:29:48,235] [INFO] [config.py:906:print]   json = {
    "train_micro_batch_size_per_gpu": 8, 
    "train_batch_size": 512, 
    "gradient_clipping": 1.0, 
    "zero_optimization": {
        "stage": 1
    }, 
    "fp16": {
        "enabled": true, 
        "loss_scale": 0, 
        "loss_scale_window": 500, 
        "hysteresis": 2, 
        "min_loss_scale": 1, 
        "initial_scale_power": 12
    }, 
    "steps_per_print": 2.000000e+03, 
    "wall_clock_breakdown": false
}
[2021-09-30 06:29:48,236] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=16 micro_batch_size=8
[2021-09-30 06:29:48,526] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-30 06:29:48,526] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-30 06:29:48,526] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-30 06:29:48,526] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-30 06:29:48,526] [INFO] [engine.py:134:__init__] RANK=35 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-30 06:29:48,526] [INFO] [engine.py:134:__init__] RANK=33 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-30 06:29:48,526] [INFO] [engine.py:134:__init__] RANK=34 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-30 06:29:48,526] [INFO] [engine.py:134:__init__] RANK=32 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-30 06:29:48,526] [INFO] [engine.py:134:__init__] RANK=48 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-30 06:29:48,526] [INFO] [engine.py:134:__init__] RANK=49 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-30 06:29:48,526] [INFO] [engine.py:134:__init__] RANK=51 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-30 06:29:48,526] [INFO] [engine.py:134:__init__] RANK=17 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-30 06:29:48,526] [INFO] [engine.py:134:__init__] RANK=18 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-30 06:29:48,526] [INFO] [engine.py:134:__init__] RANK=16 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-30 06:29:48,526] [INFO] [engine.py:134:__init__] RANK=19 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-09-30 06:29:48,526] [INFO] [engine.py:134:__init__] RANK=50 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
 > using checkpoint value 0.0002 for learning rate
 > using checkpoint value 1e-05 for minimum learning rate
 > using checkpoint value 183105 for warmup iterations
 > using checkpoint value 73242187 for total number of iterations
 > using checkpoint value cosine for decay style
successfully loaded 4 ZeRO state_dicts for rank 16
successfully loaded 4 ZeRO state_dicts for rank 24
successfully loaded 4 ZeRO state_dicts for rank 29
successfully loaded 4 ZeRO state_dicts for rank 22
successfully loaded 4 ZeRO state_dicts for rank 28
successfully loaded 4 ZeRO state_dicts for rank 26
successfully loaded 4 ZeRO state_dicts for rank 36
successfully loaded 4 ZeRO state_dicts for rank 39
successfully loaded 4 ZeRO state_dicts for rank 35
successfully loaded 4 ZeRO state_dicts for rank 20
successfully loaded 4 ZeRO state_dicts for rank 43
successfully loaded 4 ZeRO state_dicts for rank 41
successfully loaded 4 ZeRO state_dicts for rank 37
successfully loaded 4 ZeRO state_dicts for rank 18
successfully loaded 4 ZeRO state_dicts for rank 30
successfully loaded 4 ZeRO state_dicts for rank 33
successfully loaded 4 ZeRO state_dicts for rank 10
successfully loaded 4 ZeRO state_dicts for rank 45
successfully loaded 4 ZeRO state_dicts for rank 4
successfully loaded 4 ZeRO state_dicts for rank 23
successfully loaded 4 ZeRO state_dicts for rank 17
successfully loaded 4 ZeRO state_dicts for rank 31
successfully loaded 4 ZeRO state_dicts for rank 21
successfully loaded 4 ZeRO state_dicts for rank 47
successfully loaded 4 ZeRO state_dicts for rank 5
successfully loaded 4 ZeRO state_dicts for rank 32
successfully loaded 4 ZeRO state_dicts for rank 40
successfully loaded 4 ZeRO state_dicts for rank 25
successfully loaded 4 ZeRO state_dicts for rank 19
successfully loaded 4 ZeRO state_dicts for rank 27
successfully loaded 4 ZeRO state_dicts for rank 44
successfully loaded 4 ZeRO state_dicts for rank 7
successfully loaded 4 ZeRO state_dicts for rank 11
successfully loaded 4 ZeRO state_dicts for rank 13
successfully loaded 4 ZeRO state_dicts for rank 9
successfully loaded 4 ZeRO state_dicts for rank 34
successfully loaded 4 ZeRO state_dicts for rank 14
successfully loaded 4 ZeRO state_dicts for rank 50
successfully loaded 4 ZeRO state_dicts for rank 42
successfully loaded 4 ZeRO state_dicts for rank 52
successfully loaded 4 ZeRO state_dicts for rank 46
successfully loaded 4 ZeRO state_dicts for rank 56
successfully loaded 4 ZeRO state_dicts for rank 8
successfully loaded 4 ZeRO state_dicts for rank 15
loading 4 zero partition checkpoints for rank 16
successfully loaded 4 ZeRO state_dicts for rank 12
successfully loaded 4 ZeRO state_dicts for rank 38
successfully loaded 4 ZeRO state_dicts for rank 1
successfully loaded 4 ZeRO state_dicts for rank 3
successfully loaded 4 ZeRO state_dicts for rank 54
successfully loaded 4 ZeRO state_dicts for rank 0
successfully loaded 4 ZeRO state_dicts for rank 51
loading 4 zero partition checkpoints for rank 29
loading 4 zero partition checkpoints for rank 24
successfully loaded 4 ZeRO state_dicts for rank 62
successfully loaded 4 ZeRO state_dicts for rank 6
successfully loaded 4 ZeRO state_dicts for rank 53
successfully loaded 4 ZeRO state_dicts for rank 49
successfully loaded 4 ZeRO state_dicts for rank 61
loading 4 zero partition checkpoints for rank 28
successfully loaded 4 ZeRO state_dicts for rank 55
successfully loaded 4 ZeRO state_dicts for rank 48
successfully loaded 4 ZeRO state_dicts for rank 60
loading 4 zero partition checkpoints for rank 22
successfully loaded 4 ZeRO state_dicts for rank 59
loading 4 zero partition checkpoints for rank 43
loading 4 zero partition checkpoints for rank 26
successfully loaded 4 ZeRO state_dicts for rank 2
loading 4 zero partition checkpoints for rank 35
successfully loaded 4 ZeRO state_dicts for rank 58
loading 4 zero partition checkpoints for rank 33
loading 4 zero partition checkpoints for rank 36
loading 4 zero partition checkpoints for rank 18
loading 4 zero partition checkpoints for rank 39
successfully loaded 4 ZeRO state_dicts for rank 57
loading 4 zero partition checkpoints for rank 20
successfully loaded 4 ZeRO state_dicts for rank 63
loading 4 zero partition checkpoints for rank 41
loading 4 zero partition checkpoints for rank 30
loading 4 zero partition checkpoints for rank 37
loading 4 zero partition checkpoints for rank 23
loading 4 zero partition checkpoints for rank 31
loading 4 zero partition checkpoints for rank 17
loading 4 zero partition checkpoints for rank 45
loading 4 zero partition checkpoints for rank 21
loading 4 zero partition checkpoints for rank 19
loading 4 zero partition checkpoints for rank 27
loading 4 zero partition checkpoints for rank 32
loading 4 zero partition checkpoints for rank 47
loading 4 zero partition checkpoints for rank 40
loading 4 zero partition checkpoints for rank 25
loading 4 zero partition checkpoints for rank 44
loading 4 zero partition checkpoints for rank 10
loading 4 zero partition checkpoints for rank 34
loading 4 zero partition checkpoints for rank 42
loading 4 zero partition checkpoints for rank 4
loading 4 zero partition checkpoints for rank 46
loading 4 zero partition checkpoints for rank 38
loading 4 zero partition checkpoints for rank 5
loading 4 zero partition checkpoints for rank 9
loading 4 zero partition checkpoints for rank 7
loading 4 zero partition checkpoints for rank 11
loading 4 zero partition checkpoints for rank 13
loading 4 zero partition checkpoints for rank 14
loading 4 zero partition checkpoints for rank 52
loading 4 zero partition checkpoints for rank 50
loading 4 zero partition checkpoints for rank 8
loading 4 zero partition checkpoints for rank 56
loading 4 zero partition checkpoints for rank 12
loading 4 zero partition checkpoints for rank 15
loading 4 zero partition checkpoints for rank 3
loading 4 zero partition checkpoints for rank 1
loading 4 zero partition checkpoints for rank 6
loading 4 zero partition checkpoints for rank 54
loading 4 zero partition checkpoints for rank 53
loading 4 zero partition checkpoints for rank 49
loading 4 zero partition checkpoints for rank 62
loading 4 zero partition checkpoints for rank 51
loading 4 zero partition checkpoints for rank 0
 checkpoint version 3.0
loading 4 zero partition checkpoints for rank 61
loading 4 zero partition checkpoints for rank 59
loading 4 zero partition checkpoints for rank 55
loading 4 zero partition checkpoints for rank 60
loading 4 zero partition checkpoints for rank 48
loading 4 zero partition checkpoints for rank 57
loading 4 zero partition checkpoints for rank 58
loading 4 zero partition checkpoints for rank 63
loading 4 zero partition checkpoints for rank 2
  successfully loaded checkpoint from /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints at iteration 43511
time (ms) | load-checkpoint: 2112.57
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
estimated model parameters: 1.62471936
estimated model parameters: 1.62471936estimated model parameters: 1.62471936

estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.62471936
estimated model parameters: 1.209483264
estimated model parameters: 1.62471936
estimated model parameters: 1.209483264estimated model parameters: 1.209483264

estimated model parameters: 1.209483264estimated model parameters: 1.209483264

estimated model parameters: 1.62471936estimated model parameters: 1.62471936

estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264estimated model parameters: 1.209483264

estimated model parameters: 1.62471936
estimated model parameters: 1.209483264
estimated model parameters: 1.62471936estimated model parameters: 1.62471936

estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.62471936
estimated model parameters: 1.62471936
estimated model parameters: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")

/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.62471936
estimated model parameters: 1.62471936
estimated model parameters: 1.62471936estimated model parameters: 1.62471936

estimated model parameters: 1.624784896
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.209483264
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.209483264
estimated model parameters: 1.624784896
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters: 1.624784896estimated model parameters: 1.624784896

estimated model parameters: 1.624784896estimated model parameters: 1.624784896

estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488


estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
[after model, optimizer, and learning rate scheduler are built] datetime: 2021-09-30 06:29:50 
> building train, validation, and test datasets ...
 > datasets target sizes (minimum size):
    train:      73242187
    validation: 7833600
    test:       51200
> building train, validation, and test datasets for GPT ...
 > building dataset index ...
    reading sizes...
    reading pointers...
    reading document index...
    creating numpy buffer of mmap...
    creating memory view of numpy buffer...
 > finished creating indexed dataset in 0.131078 seconds
    number of documents: 304230423
 > dataset split:
    train:
     document indices in [0, 288714672) total of 288714672 documents
    validation:
     document indices in [288714672, 303926193) total of 15211521 documents
    test:
     document indices in [303926193, 304230423) total of 304230 documents
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.159 seconds
    total number of samples: 131537224
    total number of epochs: 1
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.114 seconds
    total number of samples: 13854322
    total number of epochs: 2
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.058 seconds
    total number of samples: 137384
    total number of epochs: 1
> finished creating GPT datasets ...
Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
time (ms) | model-and-optimizer-setup: 3873.68 | train/valid/test-data-iterators-setup: 5109.17
[after dataloaders are built] datetime: 2021-09-30 06:30:01 Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion


done with setup ...
training ...
Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion

Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion

Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billion

Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billion

Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billionNumber of parameters: 1.62471936 billion

Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion

Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion

Number of parameters: 1.62471936 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.62471936 billionNumber of parameters without embeddings: 1.209483264 billion

Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion

Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion

Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billion

Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billionNumber of parameters without embeddings: 1.2095488 billion

Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
[before the start of training step] datetime: 2021-09-30 06:30:01 
[2021-09-30 06:30:01,615] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information
[2021-09-30 06:30:01,616] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False
[2021-09-30 06:30:01,616] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 24 total layers
[2021-09-30 06:30:01,616] [INFO] [checkpointing.py:415:forward] ----Synchronization False
[2021-09-30 06:30:01,616] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False
[Rank 18] (after 43600 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4796.0 | max reserved: 4796.0
[Rank 2] (after 43600 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.45751953125 | reserved: 5318.0 | max reserved: 5318.0
[Rank 34] (after 43600 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4252.0 | max reserved: 4252.0
[Rank 50] (after 43600 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7326.0 | max reserved: 7326.0
[Rank 33] (after 43600 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4252.0 | max reserved: 4252.0
[Rank 17] (after 43600 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4748.0 | max reserved: 4748.0
[Rank 1] (after 43600 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.45751953125 | reserved: 5510.0 | max reserved: 5510.0
[Rank 49] (after 43600 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7166.0 | max reserved: 7166.0
[Rank 16] (after 43600 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4780.0 | max reserved: 4780.0
[Rank 32] (after 43600 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4172.0 | max reserved: 4172.0
[Rank 0] (after 43600 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.95751953125 | reserved: 5382.0 | max reserved: 5382.0
[Rank 48] (after 43600 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7166.0 | max reserved: 7166.0
[Rank 19] (after 43600 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4780.0 | max reserved: 4780.0
[Rank 51] (after 43600 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7086.0 | max reserved: 7086.0
[Rank 3] (after 43600 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.45751953125 | reserved: 5318.0 | max reserved: 5318.0
[Rank 35] (after 43600 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4204.0 | max reserved: 4204.0
 iteration    43600/  152972 | consumed samples:     17243584 | elapsed time per iteration (ms): 6174.0 | learning rate: 1.757E-04 | global batch size:   512 | lm loss: 2.919600E+00 | loss scale: 1048576.0 | grad norm: 74040.319 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    43800/  152972 | consumed samples:     17345984 | elapsed time per iteration (ms): 6064.1 | learning rate: 1.754E-04 | global batch size:   512 | lm loss: 2.909268E+00 | loss scale: 2097152.0 | grad norm: 166268.974 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-30 07:19:34,850] [INFO] [logging.py:68:log_dist] [Rank 0] step=44000, skipped=95, lr=[0.00017510855467726909, 0.00017510855467726909], mom=[(0.9, 0.999), (0.9, 0.999)]
 iteration    44000/  152972 | consumed samples:     17448384 | elapsed time per iteration (ms): 6055.0 | learning rate: 1.751E-04 | global batch size:   512 | lm loss: 2.908723E+00 | loss scale: 2097152.0 | grad norm: 183655.604 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
steps: 44000 loss: 2.8704 iter time (s): 0.003 samples/sec: 171701.104
-------------------------------------------------------------------------------------------------
 validation loss at iteration 44000 | lm loss value: 2.864611E+00 | lm loss PPL: 1.754223E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    44200/  152972 | consumed samples:     17550784 | elapsed time per iteration (ms): 6962.6 | learning rate: 1.748E-04 | global batch size:   512 | lm loss: 2.913319E+00 | loss scale: 2097152.0 | grad norm: 198986.953 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    44400/  152972 | consumed samples:     17653184 | elapsed time per iteration (ms): 6049.9 | learning rate: 1.745E-04 | global batch size:   512 | lm loss: 2.918221E+00 | loss scale: 524288.0 | grad norm: 51088.720 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    44600/  152972 | consumed samples:     17755584 | elapsed time per iteration (ms): 6081.1 | learning rate: 1.743E-04 | global batch size:   512 | lm loss: 2.921843E+00 | loss scale: 262144.0 | grad norm: 22640.502 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    44800/  152972 | consumed samples:     17857984 | elapsed time per iteration (ms): 6055.0 | learning rate: 1.740E-04 | global batch size:   512 | lm loss: 2.923079E+00 | loss scale: 262144.0 | grad norm: 25204.221 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    45000/  152972 | consumed samples:     17960384 | elapsed time per iteration (ms): 6046.2 | learning rate: 1.737E-04 | global batch size:   512 | lm loss: 2.925577E+00 | loss scale: 524288.0 | grad norm: 50240.908 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 45000 | lm loss value: 2.870045E+00 | lm loss PPL: 1.763782E+01 | 
-------------------------------------------------------------------------------------------------
saving checkpoint at iteration   45000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-09-30 09:06:28,082] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step45000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   45000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1474.20
 iteration    45200/  152972 | consumed samples:     18062784 | elapsed time per iteration (ms): 6928.5 | learning rate: 1.734E-04 | global batch size:   512 | lm loss: 2.922794E+00 | loss scale: 262144.0 | grad norm: 26291.787 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    45400/  152972 | consumed samples:     18165184 | elapsed time per iteration (ms): 6055.5 | learning rate: 1.731E-04 | global batch size:   512 | lm loss: 2.926447E+00 | loss scale: 131072.0 | grad norm: 12191.282 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    45600/  152972 | consumed samples:     18267584 | elapsed time per iteration (ms): 6055.0 | learning rate: 1.728E-04 | global batch size:   512 | lm loss: 2.923322E+00 | loss scale: 131072.0 | grad norm: 13773.656 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    45800/  152972 | consumed samples:     18369984 | elapsed time per iteration (ms): 6049.7 | learning rate: 1.725E-04 | global batch size:   512 | lm loss: 2.924240E+00 | loss scale: 131072.0 | grad norm: 12893.974 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-30 10:47:22,438] [INFO] [logging.py:68:log_dist] [Rank 0] step=46000, skipped=101, lr=[0.00017222754424386707, 0.00017222754424386707], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 46000 loss: 2.9048 iter time (s): 0.003 samples/sec: 164478.655
 iteration    46000/  152972 | consumed samples:     18472384 | elapsed time per iteration (ms): 6054.5 | learning rate: 1.722E-04 | global batch size:   512 | lm loss: 2.923149E+00 | loss scale: 262144.0 | grad norm: 26793.169 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 46000 | lm loss value: 2.871944E+00 | lm loss PPL: 1.767133E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    46200/  152972 | consumed samples:     18574784 | elapsed time per iteration (ms): 6951.2 | learning rate: 1.719E-04 | global batch size:   512 | lm loss: 2.919939E+00 | loss scale: 262144.0 | grad norm: 23854.927 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    46400/  152972 | consumed samples:     18677184 | elapsed time per iteration (ms): 6077.5 | learning rate: 1.716E-04 | global batch size:   512 | lm loss: 2.921011E+00 | loss scale: 524288.0 | grad norm: 48939.567 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration   46500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-09-30 11:40:56,228] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step46500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   46500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1632.49
 iteration    46600/  152972 | consumed samples:     18779584 | elapsed time per iteration (ms): 6079.8 | learning rate: 1.713E-04 | global batch size:   512 | lm loss: 2.924048E+00 | loss scale: 524288.0 | grad norm: 48855.646 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    46800/  152972 | consumed samples:     18881984 | elapsed time per iteration (ms): 6062.7 | learning rate: 1.710E-04 | global batch size:   512 | lm loss: 2.926130E+00 | loss scale: 524288.0 | grad norm: 57493.440 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    47000/  152972 | consumed samples:     18984384 | elapsed time per iteration (ms): 6067.8 | learning rate: 1.707E-04 | global batch size:   512 | lm loss: 2.920323E+00 | loss scale: 524288.0 | grad norm: 49518.857 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 47000 | lm loss value: 2.876297E+00 | lm loss PPL: 1.774843E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    47200/  152972 | consumed samples:     19086784 | elapsed time per iteration (ms): 6939.0 | learning rate: 1.704E-04 | global batch size:   512 | lm loss: 2.922323E+00 | loss scale: 262144.0 | grad norm: 25052.677 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    47400/  152972 | consumed samples:     19189184 | elapsed time per iteration (ms): 6048.4 | learning rate: 1.701E-04 | global batch size:   512 | lm loss: 2.918535E+00 | loss scale: 262144.0 | grad norm: 28710.579 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    47600/  152972 | consumed samples:     19291584 | elapsed time per iteration (ms): 6067.1 | learning rate: 1.698E-04 | global batch size:   512 | lm loss: 2.926729E+00 | loss scale: 131072.0 | grad norm: 17660.064 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    47800/  152972 | consumed samples:     19393984 | elapsed time per iteration (ms): 6058.4 | learning rate: 1.695E-04 | global batch size:   512 | lm loss: 2.922502E+00 | loss scale: 65536.0 | grad norm: 6168.190 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-30 14:15:26,727] [INFO] [logging.py:68:log_dist] [Rank 0] step=48000, skipped=105, lr=[0.00016921390656551464, 0.00016921390656551464], mom=[(0.9, 0.999), (0.9, 0.999)]
 iteration    48000/  152972 | consumed samples:     19496384 | elapsed time per iteration (ms): 6069.6 | learning rate: 1.692E-04 | global batch size:   512 | lm loss: 2.917836E+00 | loss scale: 65536.0 | grad norm: 8398.505 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
steps: 48000 loss: 2.9386 iter time (s): 0.003 samples/sec: 171855.617
-------------------------------------------------------------------------------------------------
 validation loss at iteration 48000 | lm loss value: 2.873613E+00 | lm loss PPL: 1.770086E+01 | 
-------------------------------------------------------------------------------------------------
saving checkpoint at iteration   48000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-09-30 14:18:22,756] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step48000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   48000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1494.75
 iteration    48200/  152972 | consumed samples:     19598784 | elapsed time per iteration (ms): 6925.8 | learning rate: 1.689E-04 | global batch size:   512 | lm loss: 2.924156E+00 | loss scale: 65536.0 | grad norm: 6228.764 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    48400/  152972 | consumed samples:     19701184 | elapsed time per iteration (ms): 6061.7 | learning rate: 1.686E-04 | global batch size:   512 | lm loss: 2.918235E+00 | loss scale: 32768.0 | grad norm: 3105.762 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    48600/  152972 | consumed samples:     19803584 | elapsed time per iteration (ms): 6051.5 | learning rate: 1.683E-04 | global batch size:   512 | lm loss: 2.917570E+00 | loss scale: 32768.0 | grad norm: 3216.903 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    48800/  152972 | consumed samples:     19905984 | elapsed time per iteration (ms): 6031.3 | learning rate: 1.680E-04 | global batch size:   512 | lm loss: 2.915371E+00 | loss scale: 65536.0 | grad norm: 6360.354 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    49000/  152972 | consumed samples:     20008384 | elapsed time per iteration (ms): 6071.3 | learning rate: 1.677E-04 | global batch size:   512 | lm loss: 2.917913E+00 | loss scale: 65536.0 | grad norm: 6754.126 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 49000 | lm loss value: 2.866655E+00 | lm loss PPL: 1.757812E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    49200/  152972 | consumed samples:     20110784 | elapsed time per iteration (ms): 6929.2 | learning rate: 1.673E-04 | global batch size:   512 | lm loss: 2.914668E+00 | loss scale: 65536.0 | grad norm: 6484.874 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    49400/  152972 | consumed samples:     20213184 | elapsed time per iteration (ms): 6056.4 | learning rate: 1.670E-04 | global batch size:   512 | lm loss: 2.912242E+00 | loss scale: 131072.0 | grad norm: 16315.915 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration   49500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-09-30 16:52:36,834] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step49500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   49500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1597.94
 iteration    49600/  152972 | consumed samples:     20315584 | elapsed time per iteration (ms): 6054.3 | learning rate: 1.667E-04 | global batch size:   512 | lm loss: 2.909956E+00 | loss scale: 131072.0 | grad norm: 12037.285 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    49800/  152972 | consumed samples:     20417984 | elapsed time per iteration (ms): 6073.9 | learning rate: 1.664E-04 | global batch size:   512 | lm loss: 2.909991E+00 | loss scale: 262144.0 | grad norm: 23917.490 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-30 17:43:08,825] [INFO] [logging.py:68:log_dist] [Rank 0] step=50000, skipped=106, lr=[0.00016607147703997586, 0.00016607147703997586], mom=[(0.9, 0.999), (0.9, 0.999)]
 iteration    50000/  152972 | consumed samples:     20520384 | elapsed time per iteration (ms): 6055.1 | learning rate: 1.661E-04 | global batch size:   512 | lm loss: 2.909899E+00 | loss scale: 262144.0 | grad norm: 24485.201 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
steps: 50000 loss: 2.9305 iter time (s): 0.003 samples/sec: 171983.492
-------------------------------------------------------------------------------------------------
 validation loss at iteration 50000 | lm loss value: 2.861146E+00 | lm loss PPL: 1.748156E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    50200/  152972 | consumed samples:     20622784 | elapsed time per iteration (ms): 7158.2 | learning rate: 1.658E-04 | global batch size:   512 | lm loss: 2.911548E+00 | loss scale: 262144.0 | grad norm: 27667.384 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    50400/  152972 | consumed samples:     20725184 | elapsed time per iteration (ms): 6105.1 | learning rate: 1.654E-04 | global batch size:   512 | lm loss: 2.917201E+00 | loss scale: 65536.0 | grad norm: 7014.944 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    50600/  152972 | consumed samples:     20827584 | elapsed time per iteration (ms): 6044.1 | learning rate: 1.651E-04 | global batch size:   512 | lm loss: 2.908647E+00 | loss scale: 65536.0 | grad norm: 6072.664 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    50800/  152972 | consumed samples:     20929984 | elapsed time per iteration (ms): 6023.8 | learning rate: 1.648E-04 | global batch size:   512 | lm loss: 2.907380E+00 | loss scale: 131072.0 | grad norm: 11268.575 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    51000/  152972 | consumed samples:     21032384 | elapsed time per iteration (ms): 6045.7 | learning rate: 1.645E-04 | global batch size:   512 | lm loss: 2.907558E+00 | loss scale: 131072.0 | grad norm: 13437.204 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 51000 | lm loss value: 2.864940E+00 | lm loss PPL: 1.754801E+01 | 
-------------------------------------------------------------------------------------------------
saving checkpoint at iteration   51000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-09-30 19:30:38,433] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step51000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   51000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1461.09
 iteration    51200/  152972 | consumed samples:     21134784 | elapsed time per iteration (ms): 6925.6 | learning rate: 1.641E-04 | global batch size:   512 | lm loss: 3.020271E+00 | loss scale: 16384.0 | grad norm: 13397.313 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    51400/  152972 | consumed samples:     21237184 | elapsed time per iteration (ms): 6037.1 | learning rate: 1.638E-04 | global batch size:   512 | lm loss: 2.932686E+00 | loss scale: 16384.0 | grad norm: 1631.523 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    51600/  152972 | consumed samples:     21339584 | elapsed time per iteration (ms): 6028.5 | learning rate: 1.635E-04 | global batch size:   512 | lm loss: 2.914483E+00 | loss scale: 16384.0 | grad norm: 1499.552 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    51800/  152972 | consumed samples:     21441984 | elapsed time per iteration (ms): 6060.1 | learning rate: 1.632E-04 | global batch size:   512 | lm loss: 2.906503E+00 | loss scale: 32768.0 | grad norm: 3206.609 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-09-30 21:11:24,351] [INFO] [logging.py:68:log_dist] [Rank 0] step=52000, skipped=114, lr=[0.00016282239189462373, 0.00016282239189462373], mom=[(0.9, 0.999), (0.9, 0.999)]
 iteration    52000/  152972 | consumed samples:     21544384 | elapsed time per iteration (ms): 6049.2 | learning rate: 1.628E-04 | global batch size:   512 | lm loss: 2.907520E+00 | loss scale: 32768.0 | grad norm: 3090.657 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
steps: 52000 loss: 2.8703 iter time (s): 0.003 samples/sec: 172847.037
-------------------------------------------------------------------------------------------------
 validation loss at iteration 52000 | lm loss value: 2.857703E+00 | lm loss PPL: 1.742147E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    52200/  152972 | consumed samples:     21646784 | elapsed time per iteration (ms): 6912.4 | learning rate: 1.625E-04 | global batch size:   512 | lm loss: 2.909141E+00 | loss scale: 65536.0 | grad norm: 6690.924 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    52400/  152972 | consumed samples:     21749184 | elapsed time per iteration (ms): 6067.1 | learning rate: 1.622E-04 | global batch size:   512 | lm loss: 2.904358E+00 | loss scale: 65536.0 | grad norm: 6140.638 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration   52500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-09-30 22:04:47,651] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step52500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   52500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1583.83
 iteration    52600/  152972 | consumed samples:     21851584 | elapsed time per iteration (ms): 6077.0 | learning rate: 1.618E-04 | global batch size:   512 | lm loss: 2.901107E+00 | loss scale: 65536.0 | grad norm: 6341.902 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    52800/  152972 | consumed samples:     21953984 | elapsed time per iteration (ms): 6076.0 | learning rate: 1.615E-04 | global batch size:   512 | lm loss: 2.902674E+00 | loss scale: 131072.0 | grad norm: 12291.430 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    53000/  152972 | consumed samples:     22056384 | elapsed time per iteration (ms): 6040.9 | learning rate: 1.611E-04 | global batch size:   512 | lm loss: 2.903507E+00 | loss scale: 131072.0 | grad norm: 11492.418 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 53000 | lm loss value: 2.850659E+00 | lm loss PPL: 1.729918E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    53200/  152972 | consumed samples:     22158784 | elapsed time per iteration (ms): 6923.1 | learning rate: 1.608E-04 | global batch size:   512 | lm loss: 2.905187E+00 | loss scale: 262144.0 | grad norm: 24142.535 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    53400/  152972 | consumed samples:     22261184 | elapsed time per iteration (ms): 6050.6 | learning rate: 1.605E-04 | global batch size:   512 | lm loss: 2.901543E+00 | loss scale: 262144.0 | grad norm: 25938.675 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    53600/  152972 | consumed samples:     22363584 | elapsed time per iteration (ms): 6084.6 | learning rate: 1.601E-04 | global batch size:   512 | lm loss: 2.900849E+00 | loss scale: 262144.0 | grad norm: 23521.593 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    53800/  152972 | consumed samples:     22465984 | elapsed time per iteration (ms): 6031.3 | learning rate: 1.598E-04 | global batch size:   512 | lm loss: 2.899153E+00 | loss scale: 524288.0 | grad norm: 45745.874 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-10-01 00:39:09,153] [INFO] [logging.py:68:log_dist] [Rank 0] step=54000, skipped=114, lr=[0.00015944839824402383, 0.00015944839824402383], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 54000 loss: 2.9207 iter time (s): 0.003 samples/sec: 170801.376
 iteration    54000/  152972 | consumed samples:     22568384 | elapsed time per iteration (ms): 6061.2 | learning rate: 1.594E-04 | global batch size:   512 | lm loss: 2.902349E+00 | loss scale: 524288.0 | grad norm: 58159.705 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 54000 | lm loss value: 2.850397E+00 | lm loss PPL: 1.729465E+01 | 
-------------------------------------------------------------------------------------------------
saving checkpoint at iteration   54000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-01 00:42:05,094] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step54000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   54000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1447.28
 iteration    54200/  152972 | consumed samples:     22670784 | elapsed time per iteration (ms): 6964.2 | learning rate: 1.591E-04 | global batch size:   512 | lm loss: 2.897913E+00 | loss scale: 1048576.0 | grad norm: 97564.150 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    54400/  152972 | consumed samples:     22773184 | elapsed time per iteration (ms): 6054.7 | learning rate: 1.588E-04 | global batch size:   512 | lm loss: 2.895984E+00 | loss scale: 524288.0 | grad norm: 44454.901 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    54600/  152972 | consumed samples:     22875584 | elapsed time per iteration (ms): 6064.5 | learning rate: 1.584E-04 | global batch size:   512 | lm loss: 2.897962E+00 | loss scale: 524288.0 | grad norm: 51173.911 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    54800/  152972 | consumed samples:     22977984 | elapsed time per iteration (ms): 6048.2 | learning rate: 1.581E-04 | global batch size:   512 | lm loss: 2.900977E+00 | loss scale: 32768.0 | grad norm: 3153.815 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration   54958 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-01 02:18:51,384] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step54958/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   54958 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1520.43
[exiting program after 1190.0495582222939 minutes] datetime: 2021-10-01 02:18:52 
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
2021-10-01 02:19:50.351779: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.351797: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.351797: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.351803: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.353143: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.353315: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.353373: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.353391: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.362312: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.363429: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.363473: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.363470: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.367406: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.367420: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.367444: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.367485: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.369152: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.369174: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.369182: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.369294: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.370314: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.370397: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.370432: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.370436: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.371296: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.371298: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.371436: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.371568: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.372354: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.372618: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.372729: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.372763: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.372934: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.372975: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.373172: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.373194: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.373862: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.373973: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.374165: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.374223: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.376220: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.376430: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.376436: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.376478: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.377139: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.377357: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.377355: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.377404: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.380371: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.380372: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.380443: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.380448: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.382042: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.382065: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.382064: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.382081: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.382965: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.383062: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.383396: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.383401: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.386137: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.386139: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.386270: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 02:19:50.386293: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report

--------------------------------------------------DeepSpeed C++/CUDA extension op report

      meet the required dependencies to JIT install the op.
DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja   .................. ......................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------


--------------------------------------------------op nameop nameop name ................
   op name................................installed   ................installed installed  ....installed    compatible..compatible..

 ---------------------------------------------------------------------------------------------------- compatible

compatible

----------------------------------------------------------------------------------------------------

cpu_adam cpu_adam...............  ...............cpu_adam[92m[YES][0m cpu_adam[92m[YES][0m   ............... .....................  ...... [92m[YES][0m [92m[YES][0m[92m[OKAY][0m [92m[OKAY][0m 
......
......  [92m[OKAY][0m[92m[OKAY][0m

fused_adam fused_adam.............  .............[93m[NO][0mfused_adamfused_adam  [93m[NO][0m....... .............   ....... .............[92m[OKAY][0m  [92m[OKAY][0m[93m[NO][0m[93m[NO][0m
 
 .......fused_lamb....... fused_lamb   [92m[OKAY][0m
[92m[OKAY][0m..........................
  fused_lamb[93m[NO][0m[93m[NO][0m fused_lamb  .................... ....................  [92m[OKAY][0m  [93m[NO][0m
[93m[NO][0m[92m[OKAY][0m  
..............  [92m[OKAY][0m[92m[OKAY][0m

sparse_attn ............ [93m[NO][0m ....... sparse_attn[92m[OKAY][0m 
............sparse_attn transformersparse_attn   [93m[NO][0m............ ........................ [93m[NO][0m.......   .......  [93m[NO][0m[92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m 

 ..............  [92m[OKAY][0mtransformerstochastic_transformer[92m[OKAY][0m

  .............transformer  transformer[93m[NO][0m  [93m[NO][0m ............ ...................   .......[92m[OKAY][0m[93m[NO][0m
 [93m[NO][0m ....... [92m[OKAY][0m .......
[92m[OKAY][0m stochastic_transformer
[92m[OKAY][0m 
. stochastic_transformer[93m[NO][0m  stochastic_transformer........   .[92m[OKAY][0m[93m[NO][0m 
 [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installedsparse_attn  ..............  compatible[93m[NO][0m
 --------------------------------------------------.......
 [92m[OKAY][0m
transformer ............ [93m[NO][0m cpu_adam.......  [92m[OKAY][0m...............
 [92m[YES][0m ...... stochastic_transformer[92m[OKAY][0m 
. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninjaninja  ....................................  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
op name op name................  ................installed  ..installed  compatible
.. --------------------------------------------------
compatible
--------------------------------------------------
cpu_adam ............... cpu_adam[92m[YES][0m  .....................  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
fused_adam .............fused_adam [93m[NO][0m  ....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
fused_lamb .............fused_lamb [93m[NO][0m  ....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
sparse_attn ............sparse_attn  [93m[NO][0m............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0mtransformer
 ............ [93m[NO][0mtransformer  ....... ............[92m[OKAY][0m 
[93m[NO][0m ....... stochastic_transformer [92m[OKAY][0m.
 [93m[NO][0m ....... stochastic_transformer[92m[OKAY][0m
 . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja


JIT compiled ops requires ninja
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninja  ....................................  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

op nameop name  ................................  installedinstalled  .... compatible 
compatible
--------------------------------------------------
--------------------------------------------------
cpu_adam cpu_adam...............  ...............[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
fused_adamfused_adam  ..........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

fused_lambfused_lamb  ..........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

sparse_attnsparse_attn  ........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

transformertransformer  ........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

stochastic_transformerstochastic_transformer  .. [93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
DeepSpeed C++/CUDA extension op report--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------DeepSpeed C++/CUDA extension op report

JIT compiled ops requires ninja--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report--------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninja--------------------------------------------------


JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
ninja .................. [92m[OKAY][0m
--------------------------------------------------
JIT compiled ops requires ninja
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ninja.............  [93m[NO][0m.................. .......  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
op name fused_lamb................  installed.............  ..[93m[NO][0m  compatible.......
 [92m[OKAY][0m
--------------------------------------------------
sparse_attn ............ cpu_adam[93m[NO][0m  ......................  [92m[OKAY][0m
transformer[92m[YES][0m  ..................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
stochastic_transformer . fused_adam[93m[NO][0m  ....................  [93m[NO][0m [92m[OKAY][0m.......
 [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report
--------------------------------------------------

DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja  .................. ..................  ..................[92m[OKAY][0m  ..................[92m[OKAY][0m
 [92m[OKAY][0m
--------------------------------------------------[92m[OKAY][0m

--------------------------------------------------
op name--------------------------------------------------
-------------------------------------------------- 

................op name  installedop name................op name    ..................installed................    compatible..installedinstalled 
  ..--------------------------------------------------compatible
.. 
 compatiblecompatible
----------------------------------------------------------------------------------------------------cpu_adam

 
--------------------------------------------------...............
 [92m[YES][0m ...... [92m[OKAY][0m
cpu_adamcpu_adamcpu_adam   .............................................   fused_adam[92m[YES][0m [92m[YES][0m[92m[YES][0m ...................    ......[93m[NO][0m......  [92m[OKAY][0m....... [92m[OKAY][0m
 [92m[OKAY][0m
[92m[OKAY][0m

fused_lamb ............. [93m[NO][0mfused_adam fused_adamfused_adam.......    ..........................[92m[OKAY][0m .............  
[93m[NO][0m[93m[NO][0m[93m[NO][0m   .....................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m

sparse_attn
ninjaninjaninja   ......................................................  [92m[OKAY][0m [92m[OKAY][0m

 ............ [93m[NO][0mfused_lamb fused_lambfused_lamb .......  ............. .......................... [92m[OKAY][0m [93m[NO][0m 
[92m[OKAY][0m--------------------------------------------------
--------------------------------------------------

 [93m[NO][0m[93m[NO][0m....... transformer ....... [92m[OKAY][0m ............ 
--------------------------------------------------op nameop name
....... [92m[OKAY][0m 
[93m[NO][0m[92m[OKAY][0m 
  op name................................   installed................installed   ....installed   compatible..compatible
 
compatible----------------------------------------------------------------------------------------------------


....... [92m[OKAY][0m
--------------------------------------------------
sparse_attnstochastic_transformer  ............sparse_attn . [93m[NO][0m  ............[93m[NO][0msparse_attn .......  [93m[NO][0m ...................  [92m[OKAY][0m[93m[NO][0m .......
[92m[OKAY][0m 
cpu_adam cpu_adam............... cpu_adam ............... [92m[YES][0m ............... [92m[YES][0m ...... [92m[YES][0m ...... [92m[OKAY][0m ......
 transformer.......[92m[OKAY][0m  
[92m[OKAY][0m 
[92m[OKAY][0m
............[92m[OKAY][0m transformer[93m[NO][0m  
...................  [92m[OKAY][0mtransformer
fused_adam .............fused_adamfused_adam   [93m[NO][0m..........................   .......[93m[NO][0m[93m[NO][0m   [92m[OKAY][0m..............
  [92m[OKAY][0m[92m[OKAY][0m
[93m[NO][0m  ...................stochastic_transformer   [93m[NO][0m[92m[OKAY][0m. 
fused_lamb
 .......[93m[NO][0m stochastic_transformer .......  [92m[OKAY][0m[92m[OKAY][0m.

 fused_lamb.............  fused_lamb.............[93m[NO][0m   .............[93m[NO][0m.......  ....... [93m[NO][0m [92m[OKAY][0m [92m[OKAY][0m
.......
 [92m[OKAY][0m
 [93m[NO][0m stochastic_transformer.......  .[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
sparse_attnsparse_attnsparse_attn   ........................ ............ [93m[NO][0m [93m[NO][0m [93m[NO][0m.......   ..............[92m[OKAY][0m  
ninjaninjaninjaninja   .................................... .................. ..................  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


[92m[OKAY][0m[92m[OKAY][0m

------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------
op nameop name
transformer transformertransformer............   ........................[93m[NO][0m   [93m[NO][0m.......[93m[NO][0m  [92m[OKAY][0m....... 
 .......[92m[OKAY][0m
 stochastic_transformer[92m[OKAY][0m 
 op name ................op name ................  ................ installedinstalled ................  installed .... installed  ..  compatible..compatible
compatible
 --------------------------------------------------
--------------------------------------------------compatible
stochastic_transformer.  [93m[NO][0m. stochastic_transformer ....... [93m[NO][0m . [92m[OKAY][0m .......
--------------------------------------------------


--------------------------------------------------
[93m[NO][0mninja  [92m[OKAY][0m 
cpu_adam ...............cpu_adam cpu_adamcpu_adam [92m[YES][0m   ....................................  ...............[92m[YES][0m  [92m[YES][0m[92m[OKAY][0m 
.........................  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
op name ................ installed .. compatible
[92m[YES][0m ...... ...... ...... [92m[OKAY][0m 
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
[92m[OKAY][0m[92m[OKAY][0mfused_adam

fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
 ............. [93m[NO][0m ....... fused_adam[92m[OKAY][0m fused_adam
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
............. fused_adam.............   [93m[NO][0m.............fused_lamb[93m[NO][0m .......    .......[93m[NO][0m............. [92m[OKAY][0m[92m[OKAY][0m  

[93m[NO][0m.......  .......fused_lamb[92m[OKAY][0mfused_lamb  
[92m[OKAY][0m .............
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
............. fused_lamb [93m[NO][0m [93m[NO][0m  ...........................   [93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m 
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
.......
sparse_attn  [92m[OKAY][0m............
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
 [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn sparse_attn............transformer   sparse_attn[93m[NO][0m........................    ...................[93m[NO][0m[93m[NO][0m    [92m[OKAY][0m[93m[NO][0m..............
   [92m[OKAY][0m
.......transformer[92m[OKAY][0m  [92m[OKAY][0mstochastic_transformer
............
  .[93m[NO][0mtransformertransformer    .......[93m[NO][0m........................    [92m[OKAY][0m[93m[NO][0m.......[93m[NO][0m
   [92m[OKAY][0m.......
stochastic_transformer.......   [92m[OKAY][0m.
 [92m[OKAY][0m[93m[NO][0m
 .......stochastic_transformer stochastic_transformer [92m[OKAY][0m 
..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninja ninja..................  ..................  .................................... [92m[OKAY][0m [92m[OKAY][0m [92m[OKAY][0m

[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
op name--------------------------------------------------

op nameop name   op name................................................    installedinstalled................ installed ..  .. .. compatiblecompatible 

compatibleinstalled----------------------------------------------------------------------------------------------------
 

..--------------------------------------------------
 compatible
cpu_adamcpu_adam -------------------------------------------------- cpu_adam
..............................   ...............[92m[YES][0m[92m[YES][0m   [92m[YES][0m............  cpu_adam...... [92m[OKAY][0m[92m[OKAY][0m  
...............
[92m[OKAY][0m 
[92m[YES][0m ...... [92m[OKAY][0m
fused_adam fused_adam.............  fused_adam.............[93m[NO][0m   ....................[93m[NO][0m  fused_adam[92m[OKAY][0m  [93m[NO][0m
.......  .......[92m[OKAY][0m 
[92m[OKAY][0mfused_lamb ..........................
fused_lamb   [93m[NO][0mfused_lamb[93m[NO][0m .............  ....... ....................  [93m[NO][0m [92m[OKAY][0m[93m[NO][0m 
 [92m[OKAY][0m..............
  [92m[OKAY][0m[92m[OKAY][0m

fused_lamb .............sparse_attn  [93m[NO][0m............  .......[93m[NO][0m  sparse_attn[92m[OKAY][0m.......sparse_attn  
[92m[OKAY][0m 
........................  [93m[NO][0m[93m[NO][0mtransformer   ..........................   [92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m

 .......sparse_attntransformer  transformer [92m[OKAY][0m ........................
 ............ [93m[NO][0m [93m[NO][0mstochastic_transformer[93m[NO][0m    ............... .......  [93m[NO][0m [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m 

....... [92m[OKAY][0m

stochastic_transformerstochastic_transformertransformer   ..............   [93m[NO][0m[93m[NO][0m[93m[NO][0m  ..............   [92m[OKAY][0m.......
[92m[OKAY][0m 
[92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja   .................. .................................... .................. [92m[OKAY][0m [92m[OKAY][0m 
[92m[OKAY][0m

[92m[OKAY][0m----------------------------------------------------------------------------------------------------

--------------------------------------------------

op name--------------------------------------------------op name 
op name ................ op name................ ................  installed installed................installed    ......installed    compatiblecompatible..compatible

 
--------------------------------------------------compatible--------------------------------------------------
--------------------------------------------------


--------------------------------------------------
cpu_adam ...............cpu_adam cpu_adam cpu_adam[92m[YES][0m...............  ...............  ............... ...... [92m[YES][0m [92m[YES][0m[92m[YES][0m  [92m[OKAY][0m ............
......   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


fused_adam fused_adamfused_adamfused_adam.............    ..........................  [93m[NO][0m.............[93m[NO][0m[93m[NO][0m    .......[93m[NO][0m.......  ....... [92m[OKAY][0m....... 
[92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

fused_lambfused_lamb  .............fused_lambfused_lamb.............    [93m[NO][0m.......................... [93m[NO][0m  ....... [93m[NO][0m[93m[NO][0m .......  [92m[OKAY][0m ..............[92m[OKAY][0m
 [92m[OKAY][0m 

[92m[OKAY][0m
sparse_attn sparse_attnsparse_attn............ sparse_attn ............ [93m[NO][0m  ............ ............[93m[NO][0m .......  [93m[NO][0m [93m[NO][0m.......[92m[OKAY][0m   
.......[92m[OKAY][0m.......  [92m[OKAY][0mtransformer[92m[OKAY][0m

 
............transformer transformer transformer[93m[NO][0m............    ........................  [93m[NO][0m.......[93m[NO][0m[93m[NO][0m   .......[92m[OKAY][0m  
..............[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m

stochastic_transformer stochastic_transformer.stochastic_transformer  stochastic_transformer [93m[NO][0m.  . ........ [93m[NO][0m [93m[NO][0m   [93m[NO][0m[92m[OKAY][0m..............
   [92m[OKAY][0m.......[92m[OKAY][0m 

[92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja--------------------------------------------------

--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja  ..................  .................. ....................................  [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m[92m[OKAY][0m
--------------------------------------------------


--------------------------------------------------
----------------------------------------------------------------------------------------------------op nameop name

  ................................op name  op nameinstalled installed  ................ .................. ..  installed compatibleinstalledcompatible  

....----------------------------------------------------------------------------------------------------  
compatible
compatible

----------------------------------------------------------------------------------------------------

cpu_adam cpu_adam...............  ...............[92m[YES][0m  [92m[YES][0m......cpu_adamcpu_adam    ......[92m[OKAY][0m............... 
............... [92m[OKAY][0m [92m[YES][0m
 [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
fused_adam ............. [93m[NO][0mfused_adam  ....................  [92m[OKAY][0m[93m[NO][0m
 fused_adam.......fused_adam  fused_lamb [92m[OKAY][0m ..........................
.............  [93m[NO][0m [93m[NO][0mfused_lamb [93m[NO][0m   ..................................   [92m[OKAY][0m[92m[OKAY][0m [93m[NO][0m

[92m[OKAY][0m .......
 fused_lamb[92m[OKAY][0m 
fused_lamb.............  sparse_attn.............[93m[NO][0m   ............[93m[NO][0m  ..............sparse_attn[93m[NO][0m   [92m[OKAY][0m [92m[OKAY][0m...................
 
 [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
transformer ............ transformer[93m[NO][0m  ................... sparse_attn [93m[NO][0m [92m[OKAY][0m sparse_attn
...................   ............[93m[NO][0m[92m[OKAY][0mstochastic_transformer   
[93m[NO][0m........   [93m[NO][0m.......[92m[OKAY][0mstochastic_transformer  
 .......[92m[OKAY][0m.  [92m[OKAY][0mtransformer

[93m[NO][0m  transformer...................   ............[93m[NO][0m[92m[OKAY][0m  
[93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
stochastic_transformer stochastic_transformer.  [93m[NO][0m.  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja


DeepSpeed C++/CUDA extension op report--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------

DeepSpeed C++/CUDA extension op report--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja--------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja


----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------


----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninja--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninjaJIT compiled ops requires ninja
--------------------------------------------------

JIT compiled ops requires ninja
ninjaninjaninjaninja   .................. .................................... ..................  [92m[OKAY][0m 
[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
--------------------------------------------------


----------------------------------------------------------------------------------------------------
op name--------------------------------------------------
 op name
................op name   ................installed................op name    installedinstalled..  .................. .. compatible  compatiblecompatible
installed

-------------------------------------------------- --------------------------------------------------
--------------------------------------------------..

 compatible
--------------------------------------------------
cpu_adam ...............cpu_adam cpu_adam [92m[YES][0m ............... ............... ...... cpu_adam[92m[YES][0m[92m[YES][0m    [92m[OKAY][0m..................... ......
 [92m[OKAY][0m [92m[YES][0m[92m[OKAY][0m
 
...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m .......fused_adam [92m[OKAY][0m 
............. fused_adamfused_adam[93m[NO][0m   fused_lamb.................................   [93m[NO][0m............. [92m[OKAY][0m [93m[NO][0m 
.......[93m[NO][0m   fused_lamb[92m[OKAY][0m..............
   .............[92m[OKAY][0m[92m[OKAY][0m 
fused_lamb
[93m[NO][0m  ....................fused_lamb   [93m[NO][0m[92m[OKAY][0m............. 
 .......sparse_attn[93m[NO][0m  [92m[OKAY][0m ............
.......  [93m[NO][0m [92m[OKAY][0msparse_attn.......  
[92m[OKAY][0m............
ninjaninjaninjaninja   ....................................  .................. [92m[OKAY][0m..................  [92m[OKAY][0m
[92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
 [93m[NO][0m .......sparse_attntransformer   [92m[OKAY][0m........................
  [93m[NO][0m[93m[NO][0m  ..............transformer sparse_attn [92m[OKAY][0m  

------------------------------------------------------------------------------------------------------------------------------------------------------


[92m[OKAY][0m........................
op nameop name op nameop name  ................ ................................ ................ installed   installedinstalled ..installed  .. compatible ....
compatible  --------------------------------------------------
compatible
  transformer[93m[NO][0m [93m[NO][0m stochastic_transformer............   ...............[93m[NO][0m    [93m[NO][0m[92m[OKAY][0m....... [92m[OKAY][0m.......
  
[92m[OKAY][0m[92m[OKAY][0m

compatible--------------------------------------------------


----------------------------------------------------------------------------------------------------

stochastic_transformertransformer  .stochastic_transformer............   [93m[NO][0m.[93m[NO][0m   .......[93m[NO][0m.......  .......[92m[OKAY][0m  
cpu_adam ...............cpu_adam  [92m[YES][0m............... cpu_adamcpu_adam ......   ...............[92m[OKAY][0m[92m[YES][0m...............   
[92m[OKAY][0m[92m[OKAY][0m

stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[92m[YES][0m......[92m[YES][0m   ......[92m[OKAY][0m ......
fused_adam[92m[OKAY][0m  [92m[OKAY][0m
.............
ninjaninjaninjaninja   .................. ....................................   ..................[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m 
[92m[OKAY][0m

--------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------


 [93m[NO][0m ....... fused_adam[92m[OKAY][0m 

............. [93m[NO][0m fused_lamb.......fused_adamfused_adam    [92m[OKAY][0m.......................................
op nameop nameop name  ................ ................ op name................ installedinstalled    .................. installed.. compatible  installed
   [93m[NO][0m[93m[NO][0mfused_lamb[93m[NO][0m    ..................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m


..compatible  --------------------------------------------------
compatible

..---------------------------------------------------------------------------------------------------- 

compatible
 .......fused_lamb  [92m[OKAY][0mfused_lamb.............
--------------------------------------------------
  ............. [93m[NO][0msparse_attn [93m[NO][0m  ..........................   sparse_attn[92m[OKAY][0m[93m[NO][0m [92m[OKAY][0m
 ...................
cpu_adam ............... cpu_adam[92m[YES][0mcpu_adam   .....................cpu_adam ............... [92m[OKAY][0m [92m[YES][0m
  [93m[NO][0m[92m[OKAY][0m 
 ............... [92m[YES][0m ......  [92m[YES][0m......[92m[OKAY][0m fused_adam 
...... [92m[OKAY][0m .............
[92m[OKAY][0m 
....... [92m[OKAY][0mtransformer
[93m[NO][0m ....... [92m[OKAY][0mfused_adam
 ............ transformer[93m[NO][0msparse_attn   ...................sparse_attn  ............ [92m[OKAY][0m[93m[NO][0m  ............
 .............fused_adam fused_lamb [93m[NO][0m .............fused_adam   [93m[NO][0m.................................    [93m[NO][0m[92m[OKAY][0m[93m[NO][0m 
.......[93m[NO][0m   [93m[NO][0m[92m[OKAY][0mstochastic_transformer....... 
....... .......  .......fused_lamb[92m[OKAY][0m [92m[OKAY][0m 
[92m[OKAY][0m.............
  ........[92m[OKAY][0m  stochastic_transformer[93m[NO][0m
[92m[OKAY][0m  
 fused_lamb
transformer........   [93m[NO][0mtransformer[92m[OKAY][0m............ 
[93m[NO][0m  ....................fused_lamb   [93m[NO][0m[92m[OKAY][0m............. 
  ...................[93m[NO][0m   [93m[NO][0m[92m[OKAY][0m....... 
sparse_attn ....... [93m[NO][0m ............ [92m[OKAY][0m .......
 .......[92m[OKAY][0m 
[92m[OKAY][0m
[93m[NO][0m  [92m[OKAY][0m.......sparse_attn 
 [92m[OKAY][0m............
stochastic_transformerstochastic_transformer  ..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

 [93m[NO][0m transformersparse_attn.......   [92m[OKAY][0m............
 ............[93m[NO][0m transformer sparse_attn[93m[NO][0m  ...................   ...................[92m[OKAY][0m[93m[NO][0m   [93m[NO][0m
.......[92m[OKAY][0m  
.......[92m[OKAY][0mstochastic_transformer
  [92m[OKAY][0mtransformer.
 stochastic_transformer ............[93m[NO][0m transformer  ........  [93m[NO][0m ............[92m[OKAY][0m[93m[NO][0m   
[93m[NO][0m..............   .......[92m[OKAY][0m[92m[OKAY][0m
 
[92m[OKAY][0m
stochastic_transformer stochastic_transformer.  .[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
ninjaninjaninjaninja   ....................................  ..................[92m[OKAY][0m .................. 
[92m[OKAY][0m [92m[OKAY][0m
--------------------------------------------------[92m[OKAY][0m
--------------------------------------------------

--------------------------------------------------

op nameop name op name................--------------------------------------------------   
................installed................  op nameinstalled installed ..  .................. .. compatible  installed
compatiblecompatible 
--------------------------------------------------
..
----------------------------------------------------------------------------------------------------
 
compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0mcpu_adam cpu_adam ...... ............... ...............cpu_adam [92m[OKAY][0m [92m[YES][0m
 [92m[YES][0m  ...........................   [92m[OKAY][0m[92m[YES][0m
[92m[OKAY][0m 
fused_adam......  .............[92m[OKAY][0m [93m[NO][0m
 .......fused_adam fused_adam [92m[OKAY][0m .............
.............  [93m[NO][0m[93m[NO][0m fused_lamb ....... .......fused_adam .............  [92m[OKAY][0m[92m[OKAY][0m .............

[93m[NO][0m  [93m[NO][0m.......fused_lambfused_lamb   [92m[OKAY][0m ....................
.............   [92m[OKAY][0m[93m[NO][0m[93m[NO][0m 
 ..............  [92m[OKAY][0m[92m[OKAY][0m
fused_lamb
sparse_attn  .........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0msparse_attnsparse_attn
[92m[OKAY][0m  
transformer........................   [93m[NO][0m............[93m[NO][0m   .......[93m[NO][0m.......   [92m[OKAY][0m.......[92m[OKAY][0m
 
[92m[OKAY][0m
transformertransformersparse_attn  ............ stochastic_transformer ........................  [93m[NO][0m  .[93m[NO][0m[93m[NO][0m .......  [93m[NO][0m .......[92m[OKAY][0m.......  
 .......[92m[OKAY][0m[92m[OKAY][0m
 
[92m[OKAY][0mstochastic_transformer
 stochastic_transformer.transformer   [93m[NO][0m. ............ .......[93m[NO][0m   [92m[OKAY][0m.......[93m[NO][0m
  [92m[OKAY][0m
....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninjaninjaninja ninja ....................................   ....................................  [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m[92m[OKAY][0m
--------------------------------------------------


----------------------------------------------------------------------------------------------------op name--------------------------------------------------


 op nameop nameop name................    ................................................installed   installed ..installed installedcompatible  
 ....-------------------------------------------------- .. 
compatible compatible
compatible
--------------------------------------------------
--------------------------------------------------

--------------------------------------------------
cpu_adam ............... [92m[YES][0m cpu_adam......cpu_adam  cpu_adam [92m[OKAY][0m............... 
............... ...............[92m[YES][0m   [92m[YES][0m......[92m[YES][0m   ......fused_adam  [92m[OKAY][0m[92m[OKAY][0m
...................
  [92m[OKAY][0m[93m[NO][0m 
.......fused_adam  [92m[OKAY][0m.............
fused_adam [93m[NO][0m fused_adam fused_lamb  ..............................................    [93m[NO][0m[92m[OKAY][0m[93m[NO][0m[93m[NO][0m
   .....................fused_lamb   [92m[OKAY][0m[92m[OKAY][0m .............

[92m[OKAY][0m 
[93m[NO][0m fused_lamb.......  fused_lamb[92m[OKAY][0m............. 
 .............[93m[NO][0m  sparse_attn[93m[NO][0m.......   ...................[92m[OKAY][0m  
[93m[NO][0m[92m[OKAY][0msparse_attn 
 ...................  [92m[OKAY][0m[93m[NO][0m
 ....... transformer[92m[OKAY][0m sparse_attn
............ transformer sparse_attn[93m[NO][0m............   ............ ............ .......[93m[NO][0m [93m[NO][0m  [92m[OKAY][0m.......[93m[NO][0m  
.......[92m[OKAY][0m  
.......stochastic_transformer[92m[OKAY][0m  
transformer[92m[OKAY][0m. 
 ............stochastic_transformer[93m[NO][0m transformer .......  [93m[NO][0m. ............   [92m[OKAY][0m.......[93m[NO][0m[93m[NO][0m 
[92m[OKAY][0m  
..............  [92m[OKAY][0m[92m[OKAY][0m
stochastic_transformer
 . [93m[NO][0mstochastic_transformer  ........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ninja [93m[NO][0m  .........................  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
op namestochastic_transformer ................  .installed  ..[93m[NO][0m  compatible
....... --------------------------------------------------[92m[OKAY][0m

cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ninja.......  [92m[OKAY][0m
.................. [92m[OKAY][0m
stochastic_transformer-------------------------------------------------- 
. [93m[NO][0mop name  .......................  [92m[OKAY][0minstalled
 .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
ninja .................. [92m[OKAY][0mfused_adam
 --------------------------------------------------.............
 [93m[NO][0m ....... [92m[OKAY][0m
op name ................fused_lamb installed  ...............  [93m[NO][0m ....... [92m[OKAY][0mcompatible

--------------------------------------------------
sparse_attn cpu_adam............  ...............[93m[NO][0m [92m[YES][0m  ............. [92m[OKAY][0m
 transformer[92m[OKAY][0m 
............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer fused_adam.  .............[93m[NO][0m  [93m[NO][0m .............. [92m[OKAY][0m
 [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer [93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`............... [93m[NO][0m .......
 [92m[OKAY][0m
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ............... 
[93m[NO][0m[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ....... 
[93m[NO][0m
transformer_inference .. [93m[NO][0m ....... async_io[92m[OKAY][0m 
............... [93m[NO][0m async_io.......  ...............[93m[NO][0m 
utils[93m[NO][0m  .........................  [92m[YES][0m[93m[NO][0m 
...... [92m[OKAY][0m
transformer_inference quantizer..  ..............[93m[NO][0m  [93m[NO][0m.......transformer_inference   .......[92m[OKAY][0m.. 
 [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
utils-------------------------------------------------- 
.................. [92m[YES][0m utils......  ..................[92m[OKAY][0m 
[92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m .......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[92m[OKAY][0m

quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ......[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [92m[OKAY][0m

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m async_io.......  [93m[NO][0m...............
async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

 [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. transformer_inference [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
 [92m[OKAY][0m
utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

[92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ....... [93m[NO][0m

transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
async_io utils...............  ..................[93m[NO][0m  [92m[YES][0m.......  ......[93m[NO][0m 
[92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0mtransformer_inference
 .. [93m[NO][0m .......-------------------------------------------------- 
[92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_ioasync_io ...............  [93m[NO][0m...............  .......[93m[NO][0m  [93m[NO][0m.......
 [93m[NO][0m
transformer_inference .. [93m[NO][0m .......transformer_inference [92m[OKAY][0m 
.. [93m[NO][0m ....... utils[92m[OKAY][0m 
.................. [92m[YES][0m ...... [92m[OKAY][0m
utils ..................quantizer  [92m[YES][0m..............  ......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------quantizer
 .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m[NO][0m ....... [93m[NO][0m

transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... utils[93m[NO][0m  .................. .......[92m[YES][0m  [93m[NO][0m...... 
[92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference--------------------------------------------------
 .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m .......[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [92m[OKAY][0m

--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
----------------------------------------------------------------------------------------------------

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utils .................. utils[92m[YES][0m  ........................  [92m[YES][0m [92m[OKAY][0m......
 [92m[OKAY][0mquantizer
 .............. [93m[NO][0m .......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m ....... --------------------------------------------------[92m[OKAY][0m

--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inferenceasync_io  .................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[93m[NO][0m

utils .................. [92m[YES][0m ......transformer_inference  [92m[OKAY][0m..
 [93m[NO][0m ....... quantizer[92m[OKAY][0m 
.............. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m --------------------------------------------------......
 [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m [93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`........ [93m[NO][0m

transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utilsasync_io  .................................  [92m[YES][0m[93m[NO][0m  .............  [92m[OKAY][0m[93m[NO][0m

quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m --------------------------------------------------.......
 [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... async_io[93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m
 ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m transformer_inference.......  ..[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
utils .................. utils[92m[YES][0m  ........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0mquantizer
 .............. [93m[NO][0mquantizer  .....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch install path ...............torch version  .................... 1.8.1
torch cuda version ...............['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 
11.1
torch versionnvcc version  .........................................  1.8.111.2

torch cuda versiondeepspeed install path  ..........................  11.1
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']nvcc version
 deepspeed info.....................  ...................11.2 
0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed install path
 deepspeed wheel compiled w............  ...... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']torch 1.8, cuda 11.1

deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
DeepSpeed general environment info:torch cuda version ...............
 11.1
nvcc version torch install path.....................  11.2...............
 ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version deepspeed install path...............  ...........11.1 
nvcc version ..................... 11.2
deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']........... 
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info
 ...................deepspeed info  0.4.2+72ce55a, 72ce55a, big-science...................
 deepspeed wheel compiled w.0.4.2+72ce55a, 72ce55a, big-science 
......deepspeed wheel compiled w.  torch 1.8, cuda 11.1......
 torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda versionDeepSpeed general environment info: ............... 11.1

nvcc version ..................... 11.2
torch install pathdeepspeed install path  ..........................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ...................['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 
0.4.2+72ce55a, 72ce55a, big-science
torch versiondeepspeed wheel compiled w.  ..........................  1.8.1torch 1.8, cuda 11.1

torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1DeepSpeed general environment info:

torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

DeepSpeed general environment info:
torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

torch version .................... 1.8.1
torch cuda version ............... 11.1
deepspeed info deepspeed info...................  ...................0.4.2+72ce55a, 72ce55a, big-science 
0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path ...............torch install path  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... torch version1.8.1 
.................... torch cuda version1.8.1 
............... torch cuda version11.1 
...............nvcc version  11.1.....................
 nvcc version11.2 
.....................deepspeed install path  11.2...........
 deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']...........
 deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']...................
 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed info
 ...................deepspeed wheel compiled w.  0.4.2+72ce55a, 72ce55a, big-science......
 deepspeed wheel compiled w.torch 1.8, cuda 11.1 
...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path 
............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version ....................['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 1.8.1

torch cuda version torch version...............  ....................11.1 
1.8.1nvcc version
 torch cuda version.....................  ...............11.2 
11.1deepspeed install path
 nvcc version...........  ..................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']11.2

deepspeed install pathdeepspeed info  ..............................  0.4.2+72ce55a, 72ce55a, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed wheel compiled w.deepspeed info  .........................  torch 1.8, cuda 11.10.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda version ...............torch cuda version  11.1...............
 nvcc version11.1 
.....................nvcc version  11.2.....................
 deepspeed install path11.2 
...........deepspeed install path  ...........['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ...................0.4.2+72ce55a, 72ce55a, big-science 
0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda version torch cuda version...............  ...............11.1 
11.1
nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ...................................... 0.4.2+72ce55a, 72ce55a, big-science 
0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... DeepSpeed general environment info:0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. 
...... torch 1.8, cuda 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
DeepSpeed general environment info:torch cuda version ...............
 11.1
torch install pathnvcc version  ....................................  11.2
deepspeed install path ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']...........
 torch version ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'].................... 
1.8.1deepspeed info
 torch cuda version ..................................  11.10.4.2+72ce55a, 72ce55a, big-science

nvcc version .....................deepspeed wheel compiled w.  11.2
......deepspeed install path  ...........torch 1.8, cuda 11.1 
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

using world size: 64, data-parallel-size: 4, tensor-model-parallel size: 4, pipeline-model-parallel size: 4 
using torch.float16 for parameters ...
------------------------ arguments ------------------------
  accumulate_allreduce_grads_in_fp32 .............. False
  adam_beta1 ...................................... 0.9
  adam_beta2 ...................................... 0.999
  adam_eps ........................................ 1e-08
  adlr_autoresume ................................. False
  adlr_autoresume_interval ........................ 1000
  apply_query_key_layer_scaling ................... True
  apply_residual_connection_post_layernorm ........ False
  attention_dropout ............................... 0.1
  attention_softmax_in_fp32 ....................... False
  bert_binary_head ................................ True
  bert_load ....................................... None
  bf16 ............................................ False
  bias_dropout_fusion ............................. True
  bias_gelu_fusion ................................ True
  biencoder_projection_dim ........................ 0
  biencoder_shared_query_context_model ............ False
  block_data_path ................................. None
  checkpoint_activations .......................... True
  checkpoint_in_cpu ............................... False
  checkpoint_num_layers ........................... 1
  clip_grad ....................................... 1.0
  codecarbon_dir .................................. None
  consumed_train_samples .......................... 0
  consumed_valid_samples .......................... 0
  contigious_checkpointing ........................ False
  cpu_optimizer ................................... False
  cpu_torch_adam .................................. False
  data_impl ....................................... mmap
  data_parallel_size .............................. 4
  data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document']
  dataloader_type ................................. single
  DDP_impl ........................................ local
  decoder_seq_length .............................. None
  deepscale ....................................... False
  deepscale_config ................................ None
  deepspeed ....................................... True
  deepspeed_activation_checkpointing .............. True
  deepspeed_config ................................ ./ds_config.1327432.json
  deepspeed_mpi ................................... False
  distribute_checkpointed_activations ............. False
  distributed_backend ............................. nccl
  embedding_path .................................. None
  encoder_seq_length .............................. 2048
  eod_mask_loss ................................... False
  eval_interval ................................... 1000
  eval_iters ...................................... 100
  evidence_data_path .............................. None
  exit_duration_in_mins ........................... 1190
  exit_interval ................................... None
  ffn_hidden_size ................................. 8192
  finetune ........................................ False
  fp16 ............................................ True
  fp16_lm_cross_entropy ........................... False
  fp32_residual_connection ........................ False
  gigaflos_no_embeds .............................. 0
  global_batch_size ............................... 512
  glu_activation .................................. None
  hidden_dropout .................................. 0.1
  hidden_size ..................................... 2048
  hysteresis ...................................... 2
  ict_head_size ................................... None
  ict_load ........................................ None
  img_dim ......................................... 224
  indexer_batch_size .............................. 128
  indexer_log_interval ............................ 1000
  init_method_std ................................. 0.02
  init_method_xavier_uniform ...................... False
  initial_loss_scale .............................. 4294967296
  kv_channels ..................................... 128
  layernorm_epsilon ............................... 1e-05
  lazy_mpu_init ................................... None
  load ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
  local_rank ...................................... 0
  log_batch_size_to_tensorboard ................... True
  log_interval .................................... 200
  log_learning_rate_to_tensorboard ................ True
  log_loss_scale_to_tensorboard ................... True
  log_num_zeros_in_grad ........................... False
  log_params_norm ................................. False
  log_timers_to_tensorboard ....................... True
  log_validation_ppl_to_tensorboard ............... True
  loss_on_targets_only ............................ False
  loss_scale ...................................... 12.0
  loss_scale_window ............................... 1000
  lr .............................................. 0.0002
  lr_decay_iters .................................. None
  lr_decay_samples ................................ 73242187
  lr_decay_style .................................. cosine
  lr_warmup_fraction .............................. None
  lr_warmup_iters ................................. 0
  lr_warmup_samples ............................... 183105
  make_vocab_size_divisible_by .................... 128
  mask_prob ....................................... 0.15
  masked_softmax_fusion ........................... True
  max_position_embeddings ......................... None
  memory_centric_tiled_linear ..................... False
  merge_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-merges.txt
  micro_batch_size ................................ 8
  min_loss_scale .................................. 1.0
  min_lr .......................................... 1e-05
  mmap_warmup ..................................... False
  no_load_optim ................................... None
  no_load_rng ..................................... None
  no_save_optim ................................... None
  no_save_rng ..................................... None
  no_train ........................................ None
  num_attention_heads ............................. 16
  num_channels .................................... 3
  num_classes ..................................... 1000
  num_layers ...................................... 24
  num_layers_per_virtual_pipeline_stage ........... None
  num_workers ..................................... 2
  onnx_safe ....................................... None
  openai_gelu ..................................... False
  optimizer ....................................... adam
  override_lr_scheduler ........................... False
  params_dtype .................................... torch.float16
  partition_activations ........................... False
  patch_dim ....................................... 16
  pipeline_model_parallel_size .................... 4
  position_embedding_type ......................... PositionEmbeddingType.rotary
  profile_backward ................................ False
  query_in_block_prob ............................. 0.1
  rampup_batch_size ............................... ['32', '32', '2_000_000']
  rank ............................................ 0
  remote_device ................................... none
  reset_attention_mask ............................ False
  reset_position_ids .............................. False
  retriever_report_topk_accuracies ................ []
  retriever_score_scaling ......................... False
  retriever_seq_length ............................ 256
  sample_rate ..................................... 1.0
  save ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
  save_interval ................................... 1500
  scatter_gather_tensors_in_pipeline .............. True
  scattered_embeddings ............................ False
  seed ............................................ 1234
  seq_length ...................................... 2048
  sgd_momentum .................................... 0.9
  short_seq_prob .................................. 0.1
  split ........................................... 949,50,1
  split_transformers .............................. False
  synchronize_each_layer .......................... False
  tensor_model_parallel_size ...................... 4
  tensorboard_dir ................................. /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/tr4c-1B3-rotary-oscar-logs/tensorboard
  tensorboard_log_interval ........................ 1
  tensorboard_queue_size .......................... 5
  tile_factor ..................................... 1
  titles_data_path ................................ None
  tokenizer_name_or_path .......................... None
  tokenizer_type .................................. GPT2BPETokenizer
  train_iters ..................................... None
  train_samples ................................... 73242187
  use_checkpoint_lr_scheduler ..................... False
  use_contiguous_buffers_in_ddp ................... False
  use_cpu_initialization .......................... None
  use_one_sent_docs ............................... False
  use_pin_memory .................................. False
  virtual_pipeline_model_parallel_size ............ None
  vocab_extra_ids ................................. 0
  vocab_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-vocab.json
  weight_decay .................................... 0.1
  world_size ...................................... 64
  zero_allgather_bucket_size ...................... 0.0
  zero_contigious_gradients ....................... False
  zero_reduce_bucket_size ......................... 0.0
  zero_reduce_scatter ............................. False
  zero_stage ...................................... 1
-------------------- end of arguments ---------------------
will use batch size rampup starting from global batch size 32 to global batch size 512 with batch size increments 32 over 2000000 samples.
> building GPT2BPETokenizer tokenizer ...
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
> setting tensorboard ...
/bin/sh: line 0: type: git: not found
 > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688)
> initializing torch distributed ...
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
> initializing tensor model parallel with size 4
> initializing pipeline model parallel with size 4
> setting random seeds to 1234 ...
[2021-10-01 02:20:02,822] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
> compiling dataset index builder ...
make: Entering directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data'
make: Nothing to be done for 'default'.
make: Leaving directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data'
>>> done with dataset index builder. Compilation time: 0.320 seconds
> compiling and loading fused kernels ...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_upper_triang_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_upper_triang_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module fused_mix_prec_layer_norm_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_mix_prec_layer_norm_cuda...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
>>> done with compiling and loading fused kernels. Compilation time: 20.893 seconds
time to initialize megatron (seconds): 72.038
[after megatron is initialized] datetime: 2021-10-01 02:20:24 
building GPT model ...
[2021-10-01 02:20:24,124] [INFO] [utils.py:680:see_memory_usage] Before Building Model
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved
  warnings.warn(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved
  warnings.warn(
[2021-10-01 02:20:24,125] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB         Max_MA 0.0 GB         CA 0.0 GB         Max_CA 0 GB 
[2021-10-01 02:20:24,125] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory:  used = 40.5 GB, percent = 21.6%
SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None
Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=1, data=0, model=0): 16, ProcessCoord(pipe=1, data=0, model=1): 17, ProcessCoord(pipe=1, data=0, model=2): 18, ProcessCoord(pipe=1, data=0, model=3): 19, ProcessCoord(pipe=1, data=1, model=0): 20, ProcessCoord(pipe=1, data=1, model=1): 21, ProcessCoord(pipe=1, data=1, model=2): 22, ProcessCoord(pipe=1, data=1, model=3): 23, ProcessCoord(pipe=1, data=2, model=0): 24, ProcessCoord(pipe=1, data=2, model=1): 25, ProcessCoord(pipe=1, data=2, model=2): 26, ProcessCoord(pipe=1, data=2, model=3): 27, ProcessCoord(pipe=1, data=3, model=0): 28, ProcessCoord(pipe=1, data=3, model=1): 29, ProcessCoord(pipe=1, data=3, model=2): 30, ProcessCoord(pipe=1, data=3, model=3): 31, ProcessCoord(pipe=2, data=0, model=0): 32, ProcessCoord(pipe=2, data=0, model=1): 33, ProcessCoord(pipe=2, data=0, model=2): 34, ProcessCoord(pipe=2, data=0, model=3): 35, ProcessCoord(pipe=2, data=1, model=0): 36, ProcessCoord(pipe=2, data=1, model=1): 37, ProcessCoord(pipe=2, data=1, model=2): 38, ProcessCoord(pipe=2, data=1, model=3): 39, ProcessCoord(pipe=2, data=2, model=0): 40, ProcessCoord(pipe=2, data=2, model=1): 41, ProcessCoord(pipe=2, data=2, model=2): 42, ProcessCoord(pipe=2, data=2, model=3): 43, ProcessCoord(pipe=2, data=3, model=0): 44, ProcessCoord(pipe=2, data=3, model=1): 45, ProcessCoord(pipe=2, data=3, model=2): 46, ProcessCoord(pipe=2, data=3, model=3): 47, ProcessCoord(pipe=3, data=0, model=0): 48, ProcessCoord(pipe=3, data=0, model=1): 49, ProcessCoord(pipe=3, data=0, model=2): 50, ProcessCoord(pipe=3, data=0, model=3): 51, ProcessCoord(pipe=3, data=1, model=0): 52, ProcessCoord(pipe=3, data=1, model=1): 53, ProcessCoord(pipe=3, data=1, model=2): 54, ProcessCoord(pipe=3, data=1, model=3): 55, ProcessCoord(pipe=3, data=2, model=0): 56, ProcessCoord(pipe=3, data=2, model=1): 57, ProcessCoord(pipe=3, data=2, model=2): 58, ProcessCoord(pipe=3, data=2, model=3): 59, ProcessCoord(pipe=3, data=3, model=0): 60, ProcessCoord(pipe=3, data=3, model=1): 61, ProcessCoord(pipe=3, data=3, model=2): 62, ProcessCoord(pipe=3, data=3, model=3): 63}
[2021-10-01 02:20:24,648] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer
stage=0 layers=9
     0: _to_float16
     1: EmbeddingPipe
     2: <lambda>
     3: ParallelTransformerLayerPipe
     4: ParallelTransformerLayerPipe
     5: ParallelTransformerLayerPipe
     6: ParallelTransformerLayerPipe
     7: ParallelTransformerLayerPipe
     8: ParallelTransformerLayerPipe
stage=1 layers=6
     9: ParallelTransformerLayerPipe
    10: ParallelTransformerLayerPipe
    11: ParallelTransformerLayerPipe
    12: ParallelTransformerLayerPipe
    13: ParallelTransformerLayerPipe
    14: ParallelTransformerLayerPipe
stage=2 layers=6
    15: ParallelTransformerLayerPipe
    16: ParallelTransformerLayerPipe
    17: ParallelTransformerLayerPipe
    18: ParallelTransformerLayerPipe
    19: ParallelTransformerLayerPipe
    20: ParallelTransformerLayerPipe
stage=3 layers=10
    21: ParallelTransformerLayerPipe
    22: ParallelTransformerLayerPipe
    23: ParallelTransformerLayerPipe
    24: ParallelTransformerLayerPipe
    25: ParallelTransformerLayerPipe
    26: ParallelTransformerLayerPipe
    27: <lambda>
    28: MixedFusedLayerNorm
    29: EmbeddingPipe
    30: float16_to_fp32
  loss: CrossEntropy
 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 101549056
 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 101549056
 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 101544960
 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 101549056
 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 101544960
 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 101544960
 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 101549056
[2021-10-01 02:20:24,980] [INFO] [utils.py:680:see_memory_usage] After Building Model
[2021-10-01 02:20:24,981] [INFO] [utils.py:681:see_memory_usage] MA 0.2 GB         Max_MA 0.22 GB         CA 0.24 GB         Max_CA 0 GB 
[2021-10-01 02:20:24,981] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory:  used = 40.88 GB, percent = 21.8%
 > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 101544960
setting training iterations to 152972
> learning rate decay style: cosine
DeepSpeed is enabled.
[2021-10-01 02:20:25,000] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+72ce55a, git-hash=72ce55a, git-branch=big-science
[2021-10-01 02:20:25,068] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False
[2021-10-01 02:20:25,068] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer
[2021-10-01 02:20:25,069] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer
[2021-10-01 02:20:25,069] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam
[2021-10-01 02:20:25,069] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type=<class 'apex.optimizers.fused_adam.FusedAdam'>
[2021-10-01 02:20:25,069] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer
[2021-10-01 02:20:25,069] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000
[2021-10-01 02:20:25,069] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000
[2021-10-01 02:20:25,069] [INFO] [stage2.py:108:__init__] CPU Offload: False
[2021-10-01 02:20:25,069] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False
[2021-10-01 02:20:25,321] [INFO] [stage2.py:419:__init__] optimizer state initialized
[2021-10-01 02:20:25,321] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam
[2021-10-01 02:20:25,321] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler
[2021-10-01 02:20:25,321] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = <megatron.learning_rates.AnnealingLR object at 0x147ee8f5d1c0>
[2021-10-01 02:20:25,321] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)]
[2021-10-01 02:20:25,321] [INFO] [config.py:900:print] DeepSpeedEngine configuration:
[2021-10-01 02:20:25,322] [INFO] [config.py:904:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2021-10-01 02:20:25,322] [INFO] [config.py:904:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2021-10-01 02:20:25,322] [INFO] [config.py:904:print]   allreduce_always_fp32 ........ False
[2021-10-01 02:20:25,322] [INFO] [config.py:904:print]   amp_enabled .................. False
[2021-10-01 02:20:25,322] [INFO] [config.py:904:print]   amp_params ................... False
[2021-10-01 02:20:25,322] [INFO] [config.py:904:print]   checkpoint_tag_validation_enabled  True
[2021-10-01 02:20:25,322] [INFO] [config.py:904:print]   checkpoint_tag_validation_fail  False
[2021-10-01 02:20:25,322] [INFO] [config.py:904:print]   disable_allgather ............ False
[2021-10-01 02:20:25,322] [INFO] [config.py:904:print]   dump_state ................... False
[2021-10-01 02:20:25,322] [INFO] [config.py:904:print]   dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1}
[2021-10-01 02:20:25,322] [INFO] [config.py:904:print]   eigenvalue_enabled ........... False
[2021-10-01 02:20:25,322] [INFO] [config.py:904:print]   eigenvalue_gas_boundary_resolution  1
[2021-10-01 02:20:25,322] [INFO] [config.py:904:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2021-10-01 02:20:25,322] [INFO] [config.py:904:print]   eigenvalue_layer_num ......... 0
[2021-10-01 02:20:25,322] [INFO] [config.py:904:print]   eigenvalue_max_iter .......... 100
[2021-10-01 02:20:25,322] [INFO] [config.py:904:print]   eigenvalue_stability ......... 1e-06
[2021-10-01 02:20:25,322] [INFO] [config.py:904:print]   eigenvalue_tol ............... 0.01
[2021-10-01 02:20:25,322] [INFO] [config.py:904:print]   eigenvalue_verbose ........... False
[2021-10-01 02:20:25,322] [INFO] [config.py:904:print]   elasticity_enabled ........... False
[2021-10-01 02:20:25,322] [INFO] [config.py:904:print]   flops_profiler_config ........ {
    "enabled": false, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2021-10-01 02:20:25,322] [INFO] [config.py:904:print]   fp16_enabled ................. True
[2021-10-01 02:20:25,322] [INFO] [config.py:904:print]   fp16_mixed_quantize .......... False
[2021-10-01 02:20:25,322] [INFO] [config.py:904:print]   global_rank .................. 0
[2021-10-01 02:20:25,322] [INFO] [config.py:904:print]   gradient_accumulation_steps .. 16
[2021-10-01 02:20:25,322] [INFO] [config.py:904:print]   gradient_clipping ............ 1.0
[2021-10-01 02:20:25,323] [INFO] [config.py:904:print]   gradient_predivide_factor .... 1.0
[2021-10-01 02:20:25,323] [INFO] [config.py:904:print]   initial_dynamic_scale ........ 4096
[2021-10-01 02:20:25,323] [INFO] [config.py:904:print]   loss_scale ................... 0
[2021-10-01 02:20:25,323] [INFO] [config.py:904:print]   memory_breakdown ............. False
[2021-10-01 02:20:25,323] [INFO] [config.py:904:print]   optimizer_legacy_fusion ...... False
[2021-10-01 02:20:25,323] [INFO] [config.py:904:print]   optimizer_name ............... None
[2021-10-01 02:20:25,323] [INFO] [config.py:904:print]   optimizer_params ............. None
[2021-10-01 02:20:25,323] [INFO] [config.py:904:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2021-10-01 02:20:25,323] [INFO] [config.py:904:print]   pld_enabled .................. False
[2021-10-01 02:20:25,323] [INFO] [config.py:904:print]   pld_params ................... False
[2021-10-01 02:20:25,323] [INFO] [config.py:904:print]   prescale_gradients ........... False
[2021-10-01 02:20:25,323] [INFO] [config.py:904:print]   quantize_change_rate ......... 0.001
[2021-10-01 02:20:25,323] [INFO] [config.py:904:print]   quantize_groups .............. 1
[2021-10-01 02:20:25,323] [INFO] [config.py:904:print]   quantize_offset .............. 1000
[2021-10-01 02:20:25,323] [INFO] [config.py:904:print]   quantize_period .............. 1000
[2021-10-01 02:20:25,323] [INFO] [config.py:904:print]   quantize_rounding ............ 0
[2021-10-01 02:20:25,323] [INFO] [config.py:904:print]   quantize_start_bits .......... 16
[2021-10-01 02:20:25,323] [INFO] [config.py:904:print]   quantize_target_bits ......... 8
[2021-10-01 02:20:25,323] [INFO] [config.py:904:print]   quantize_training_enabled .... False
[2021-10-01 02:20:25,323] [INFO] [config.py:904:print]   quantize_type ................ 0
[2021-10-01 02:20:25,323] [INFO] [config.py:904:print]   quantize_verbose ............. False
[2021-10-01 02:20:25,323] [INFO] [config.py:904:print]   scheduler_name ............... None
[2021-10-01 02:20:25,323] [INFO] [config.py:904:print]   scheduler_params ............. None
[2021-10-01 02:20:25,323] [INFO] [config.py:904:print]   sparse_attention ............. None
[2021-10-01 02:20:25,323] [INFO] [config.py:904:print]   sparse_gradients_enabled ..... False
[2021-10-01 02:20:25,323] [INFO] [config.py:904:print]   steps_per_print .............. 2000
[2021-10-01 02:20:25,323] [INFO] [config.py:904:print]   tensorboard_enabled .......... False
[2021-10-01 02:20:25,323] [INFO] [config.py:904:print]   tensorboard_job_name ......... DeepSpeedJobName
[2021-10-01 02:20:25,323] [INFO] [config.py:904:print]   tensorboard_output_path ...... 
[2021-10-01 02:20:25,323] [INFO] [config.py:904:print]   train_batch_size ............. 512
[2021-10-01 02:20:25,323] [INFO] [config.py:904:print]   train_micro_batch_size_per_gpu  8
[2021-10-01 02:20:25,323] [INFO] [config.py:904:print]   use_quantizer_kernel ......... False
[2021-10-01 02:20:25,323] [INFO] [config.py:904:print]   wall_clock_breakdown ......... False
[2021-10-01 02:20:25,323] [INFO] [config.py:904:print]   world_size ................... 4
[2021-10-01 02:20:25,323] [INFO] [config.py:904:print]   zero_allow_untested_optimizer  False
[2021-10-01 02:20:25,323] [INFO] [config.py:904:print]   zero_config .................. {
    "stage": 1, 
    "contiguous_gradients": false, 
    "reduce_scatter": true, 
    "reduce_bucket_size": 5.000000e+08, 
    "allgather_partitions": true, 
    "allgather_bucket_size": 5.000000e+08, 
    "overlap_comm": false, 
    "load_from_fp32_weights": true, 
    "elastic_checkpoint": true, 
    "offload_param": null, 
    "offload_optimizer": null, 
    "sub_group_size": 1.000000e+09, 
    "prefetch_bucket_size": 5.000000e+07, 
    "param_persistence_threshold": 1.000000e+05, 
    "max_live_parameters": 1.000000e+09, 
    "max_reuse_distance": 1.000000e+09, 
    "gather_fp16_weights_on_model_save": false, 
    "ignore_unused_parameters": true, 
    "round_robin_gradients": false, 
    "legacy_stage1": false
}
[2021-10-01 02:20:25,324] [INFO] [config.py:904:print]   zero_enabled ................. True
[2021-10-01 02:20:25,324] [INFO] [config.py:904:print]   zero_optimization_stage ...... 1
[2021-10-01 02:20:25,324] [INFO] [config.py:906:print]   json = {
    "train_micro_batch_size_per_gpu": 8, 
    "train_batch_size": 512, 
    "gradient_clipping": 1.0, 
    "zero_optimization": {
        "stage": 1
    }, 
    "fp16": {
        "enabled": true, 
        "loss_scale": 0, 
        "loss_scale_window": 500, 
        "hysteresis": 2, 
        "min_loss_scale": 1, 
        "initial_scale_power": 12
    }, 
    "steps_per_print": 2.000000e+03, 
    "wall_clock_breakdown": false
}
[2021-10-01 02:20:25,324] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=16 micro_batch_size=8
[2021-10-01 02:20:25,615] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-01 02:20:25,615] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-01 02:20:25,615] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-01 02:20:25,615] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-01 02:20:25,615] [INFO] [engine.py:134:__init__] RANK=33 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-01 02:20:25,615] [INFO] [engine.py:134:__init__] RANK=32 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-01 02:20:25,615] [INFO] [engine.py:134:__init__] RANK=34 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-01 02:20:25,615] [INFO] [engine.py:134:__init__] RANK=35 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-01 02:20:25,615] [INFO] [engine.py:134:__init__] RANK=49 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-01 02:20:25,615] [INFO] [engine.py:134:__init__] RANK=51 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-01 02:20:25,615] [INFO] [engine.py:134:__init__] RANK=48 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-01 02:20:25,615] [INFO] [engine.py:134:__init__] RANK=50 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-01 02:20:25,615] [INFO] [engine.py:134:__init__] RANK=16 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-01 02:20:25,615] [INFO] [engine.py:134:__init__] RANK=17 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-01 02:20:25,615] [INFO] [engine.py:134:__init__] RANK=19 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-01 02:20:25,615] [INFO] [engine.py:134:__init__] RANK=18 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
 > using checkpoint value 0.0002 for learning rate
 > using checkpoint value 1e-05 for minimum learning rate
 > using checkpoint value 183105 for warmup iterations
 > using checkpoint value 73242187 for total number of iterations
 > using checkpoint value cosine for decay style
successfully loaded 4 ZeRO state_dicts for rank 44
successfully loaded 4 ZeRO state_dicts for rank 32
successfully loaded 4 ZeRO state_dicts for rank 24
successfully loaded 4 ZeRO state_dicts for rank 40
successfully loaded 4 ZeRO state_dicts for rank 16
successfully loaded 4 ZeRO state_dicts for rank 25
successfully loaded 4 ZeRO state_dicts for rank 17
successfully loaded 4 ZeRO state_dicts for rank 28
successfully loaded 4 ZeRO state_dicts for rank 9
successfully loaded 4 ZeRO state_dicts for rank 21
successfully loaded 4 ZeRO state_dicts for rank 10
successfully loaded 4 ZeRO state_dicts for rank 20
successfully loaded 4 ZeRO state_dicts for rank 1
successfully loaded 4 ZeRO state_dicts for rank 48
successfully loaded 4 ZeRO state_dicts for rank 26
loading 4 zero partition checkpoints for rank 44
successfully loaded 4 ZeRO state_dicts for rank 36
successfully loaded 4 ZeRO state_dicts for rank 0
successfully loaded 4 ZeRO state_dicts for rank 6
successfully loaded 4 ZeRO state_dicts for rank 14
successfully loaded 4 ZeRO state_dicts for rank 4
successfully loaded 4 ZeRO state_dicts for rank 29
successfully loaded 4 ZeRO state_dicts for rank 38
successfully loaded 4 ZeRO state_dicts for rank 30
successfully loaded 4 ZeRO state_dicts for rank 46
successfully loaded 4 ZeRO state_dicts for rank 35
successfully loaded 4 ZeRO state_dicts for rank 22
successfully loaded 4 ZeRO state_dicts for rank 18
successfully loaded 4 ZeRO state_dicts for rank 37
successfully loaded 4 ZeRO state_dicts for rank 41
successfully loaded 4 ZeRO state_dicts for rank 60
successfully loaded 4 ZeRO state_dicts for rank 45
successfully loaded 4 ZeRO state_dicts for rank 33
successfully loaded 4 ZeRO state_dicts for rank 49
successfully loaded 4 ZeRO state_dicts for rank 62
successfully loaded 4 ZeRO state_dicts for rank 31
successfully loaded 4 ZeRO state_dicts for rank 27
successfully loaded 4 ZeRO state_dicts for rank 12
loading 4 zero partition checkpoints for rank 32
successfully loaded 4 ZeRO state_dicts for rank 3
successfully loaded 4 ZeRO state_dicts for rank 19
successfully loaded 4 ZeRO state_dicts for rank 23
successfully loaded 4 ZeRO state_dicts for rank 2
successfully loaded 4 ZeRO state_dicts for rank 34
successfully loaded 4 ZeRO state_dicts for rank 11
successfully loaded 4 ZeRO state_dicts for rank 54
successfully loaded 4 ZeRO state_dicts for rank 43
successfully loaded 4 ZeRO state_dicts for rank 8
successfully loaded 4 ZeRO state_dicts for rank 47
successfully loaded 4 ZeRO state_dicts for rank 15
successfully loaded 4 ZeRO state_dicts for rank 7
successfully loaded 4 ZeRO state_dicts for rank 5
successfully loaded 4 ZeRO state_dicts for rank 13
successfully loaded 4 ZeRO state_dicts for rank 42
successfully loaded 4 ZeRO state_dicts for rank 56
successfully loaded 4 ZeRO state_dicts for rank 53
successfully loaded 4 ZeRO state_dicts for rank 52
successfully loaded 4 ZeRO state_dicts for rank 39
successfully loaded 4 ZeRO state_dicts for rank 61
successfully loaded 4 ZeRO state_dicts for rank 57
successfully loaded 4 ZeRO state_dicts for rank 50
loading 4 zero partition checkpoints for rank 24
loading 4 zero partition checkpoints for rank 16
loading 4 zero partition checkpoints for rank 40
successfully loaded 4 ZeRO state_dicts for rank 58
successfully loaded 4 ZeRO state_dicts for rank 59
successfully loaded 4 ZeRO state_dicts for rank 51
successfully loaded 4 ZeRO state_dicts for rank 55
loading 4 zero partition checkpoints for rank 28
loading 4 zero partition checkpoints for rank 17
loading 4 zero partition checkpoints for rank 25
loading 4 zero partition checkpoints for rank 21
loading 4 zero partition checkpoints for rank 20
loading 4 zero partition checkpoints for rank 26
successfully loaded 4 ZeRO state_dicts for rank 63
loading 4 zero partition checkpoints for rank 36
loading 4 zero partition checkpoints for rank 9
loading 4 zero partition checkpoints for rank 29
loading 4 zero partition checkpoints for rank 22
loading 4 zero partition checkpoints for rank 38
loading 4 zero partition checkpoints for rank 46
loading 4 zero partition checkpoints for rank 30
loading 4 zero partition checkpoints for rank 35
loading 4 zero partition checkpoints for rank 18
loading 4 zero partition checkpoints for rank 41
loading 4 zero partition checkpoints for rank 33
loading 4 zero partition checkpoints for rank 10
loading 4 zero partition checkpoints for rank 37
loading 4 zero partition checkpoints for rank 27
loading 4 zero partition checkpoints for rank 1
loading 4 zero partition checkpoints for rank 48
loading 4 zero partition checkpoints for rank 45
loading 4 zero partition checkpoints for rank 19
loading 4 zero partition checkpoints for rank 23
loading 4 zero partition checkpoints for rank 31
loading 4 zero partition checkpoints for rank 43
loading 4 zero partition checkpoints for rank 34
loading 4 zero partition checkpoints for rank 14
loading 4 zero partition checkpoints for rank 47
loading 4 zero partition checkpoints for rank 6
loading 4 zero partition checkpoints for rank 0
 checkpoint version 3.0
loading 4 zero partition checkpoints for rank 42
loading 4 zero partition checkpoints for rank 4
loading 4 zero partition checkpoints for rank 39
loading 4 zero partition checkpoints for rank 12
loading 4 zero partition checkpoints for rank 60
loading 4 zero partition checkpoints for rank 2
loading 4 zero partition checkpoints for rank 54
loading 4 zero partition checkpoints for rank 49
loading 4 zero partition checkpoints for rank 8
loading 4 zero partition checkpoints for rank 62
loading 4 zero partition checkpoints for rank 3
loading 4 zero partition checkpoints for rank 11
loading 4 zero partition checkpoints for rank 13
loading 4 zero partition checkpoints for rank 5
loading 4 zero partition checkpoints for rank 15
loading 4 zero partition checkpoints for rank 56
loading 4 zero partition checkpoints for rank 7
loading 4 zero partition checkpoints for rank 52
loading 4 zero partition checkpoints for rank 53
loading 4 zero partition checkpoints for rank 61
loading 4 zero partition checkpoints for rank 51
loading 4 zero partition checkpoints for rank 50
loading 4 zero partition checkpoints for rank 57
loading 4 zero partition checkpoints for rank 58
loading 4 zero partition checkpoints for rank 55
loading 4 zero partition checkpoints for rank 59
loading 4 zero partition checkpoints for rank 63
  successfully loaded checkpoint from /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints at iteration 54958
time (ms) | load-checkpoint: 2068.94
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264estimated model parameters: 1.209483264

estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.62471936
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264estimated model parameters: 1.209483264
estimated model parameters: 1.624784896estimated model parameters: 1.624784896

estimated model parameters: 1.62471936
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264

estimated model parameters: 1.209483264
estimated model parameters: 1.624784896
estimated model parameters: 1.62471936
estimated model parameters: 1.209483264estimated model parameters: 1.209483264

estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.62471936
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.624784896
estimated model parameters: 1.209483264
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264


/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.2095488
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters: 1.62471936estimated model parameters: 1.62471936

estimated model parameters: 1.62471936
estimated model parameters: 1.62471936
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.62471936
estimated model parameters: 1.62471936estimated model parameters: 1.62471936

estimated model parameters: 1.62471936
estimated model parameters: 1.624784896estimated model parameters: 1.624784896

estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.62471936
estimated model parameters: 1.62471936estimated model parameters: 1.62471936

estimated model parameters: 1.62471936
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488

estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264


estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
[after model, optimizer, and learning rate scheduler are built] datetime: 2021-10-01 02:20:27 
> building train, validation, and test datasets ...
 > datasets target sizes (minimum size):
    train:      73242187
    validation: 7833600
    test:       51200
> building train, validation, and test datasets for GPT ...
 > building dataset index ...
    reading sizes...
    reading pointers...
    reading document index...
    creating numpy buffer of mmap...
    creating memory view of numpy buffer...
 > finished creating indexed dataset in 0.100010 seconds
    number of documents: 304230423
 > dataset split:
    train:
     document indices in [0, 288714672) total of 288714672 documents
    validation:
     document indices in [288714672, 303926193) total of 15211521 documents
    test:
     document indices in [303926193, 304230423) total of 304230 documents
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.255 seconds
    total number of samples: 131537224
    total number of epochs: 1
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.242 seconds
    total number of samples: 13854322
    total number of epochs: 2
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.040 seconds
    total number of samples: 137384
    total number of epochs: 1
> finished creating GPT datasets ...
[after dataloaders are built] datetime: 2021-10-01 02:20:33 
done with setup ...
training ...
time (ms) | model-and-optimizer-setup: 3739.50 | train/valid/test-data-iterators-setup: 4548.14
Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billion

Number of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billion

Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion

Number of parameters: 1.624784896 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion


Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion

Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billion

Number of parameters: 1.624784896 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion

Number of parameters: 1.624784896 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.624784896 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion

Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.624784896 billionNumber of parameters without embeddings: 1.2095488 billion

Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion

Number of parameters without embeddings: 1.2095488 billionNumber of parameters without embeddings: 1.2095488 billion

Number of parameters without embeddings: 1.2095488 billion
Number of parameters: 1.624784896 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion

Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion

Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
[before the start of training step] datetime: 2021-10-01 02:20:33 
[2021-10-01 02:20:33,188] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information
[2021-10-01 02:20:33,188] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False
[2021-10-01 02:20:33,188] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 24 total layers
[2021-10-01 02:20:33,188] [INFO] [checkpointing.py:415:forward] ----Synchronization False
[2021-10-01 02:20:33,188] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False
[Rank 49] (after 55000 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7230.0 | max reserved: 7230.0
[Rank 34] (after 55000 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4188.0 | max reserved: 4188.0
[Rank 50] (after 55000 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7310.0 | max reserved: 7310.0
[Rank 18] (after 55000 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4716.0 | max reserved: 4716.0
[Rank 2] (after 55000 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.45751953125 | reserved: 5510.0 | max reserved: 5510.0
[Rank 19] (after 55000 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4716.0 | max reserved: 4716.0
[Rank 3] (after 55000 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.45751953125 | reserved: 5510.0 | max reserved: 5510.0
[Rank 35] (after 55000 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4332.0 | max reserved: 4332.0
[Rank 51] (after 55000 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7310.0 | max reserved: 7310.0
[Rank 33] (after 55000 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4348.0 | max reserved: 4348.0
[Rank 17] (after 55000 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4764.0 | max reserved: 4764.0
[Rank 1] (after 55000 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.45751953125 | reserved: 5334.0 | max reserved: 5334.0
 iteration    55000/  152972 | consumed samples:     23080384 | elapsed time per iteration (ms): 6271.4 | learning rate: 1.577E-04 | global batch size:   512 | lm loss: 2.886464E+00 | loss scale: 16384.0 | grad norm: 1147.540 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[Rank 16] (after 55000 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4780.0 | max reserved: 4780.0
[Rank 0] (after 55000 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.45751953125 | reserved: 5350.0 | max reserved: 5350.0
[Rank 32] (after 55000 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4348.0 | max reserved: 4348.0
[Rank 48] (after 55000 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 6694.0 | max reserved: 6694.0
-------------------------------------------------------------------------------------------------
 validation loss at iteration 55000 | lm loss value: 2.839504E+00 | lm loss PPL: 1.710728E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    55200/  152972 | consumed samples:     23182784 | elapsed time per iteration (ms): 6895.9 | learning rate: 1.574E-04 | global batch size:   512 | lm loss: 2.881364E+00 | loss scale: 16384.0 | grad norm: 1406.043 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    55400/  152972 | consumed samples:     23285184 | elapsed time per iteration (ms): 6062.5 | learning rate: 1.570E-04 | global batch size:   512 | lm loss: 2.878737E+00 | loss scale: 32768.0 | grad norm: 3041.955 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration   55500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-01 03:18:10,879] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step55500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   55500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1577.47
 iteration    55600/  152972 | consumed samples:     23387584 | elapsed time per iteration (ms): 6032.7 | learning rate: 1.567E-04 | global batch size:   512 | lm loss: 2.875085E+00 | loss scale: 32768.0 | grad norm: 3647.458 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    55800/  152972 | consumed samples:     23489984 | elapsed time per iteration (ms): 6035.1 | learning rate: 1.563E-04 | global batch size:   512 | lm loss: 2.879582E+00 | loss scale: 32768.0 | grad norm: 3202.532 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-10-01 04:08:35,936] [INFO] [logging.py:68:log_dist] [Rank 0] step=56000, skipped=121, lr=[0.0001559812073726173, 0.0001559812073726173], mom=[(0.9, 0.999), (0.9, 0.999)]
 iteration    56000/  152972 | consumed samples:     23592384 | elapsed time per iteration (ms): 6070.9 | learning rate: 1.560E-04 | global batch size:   512 | lm loss: 2.877161E+00 | loss scale: 65536.0 | grad norm: 6166.710 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
steps: 56000 loss: 2.8667 iter time (s): 0.003 samples/sec: 171839.885
-------------------------------------------------------------------------------------------------
 validation loss at iteration 56000 | lm loss value: 2.831573E+00 | lm loss PPL: 1.697214E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    56200/  152972 | consumed samples:     23694784 | elapsed time per iteration (ms): 6907.0 | learning rate: 1.556E-04 | global batch size:   512 | lm loss: 2.883056E+00 | loss scale: 65536.0 | grad norm: 5827.407 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    56400/  152972 | consumed samples:     23797184 | elapsed time per iteration (ms): 6071.2 | learning rate: 1.553E-04 | global batch size:   512 | lm loss: 2.885647E+00 | loss scale: 131072.0 | grad norm: 13734.348 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    56600/  152972 | consumed samples:     23899584 | elapsed time per iteration (ms): 6062.2 | learning rate: 1.549E-04 | global batch size:   512 | lm loss: 2.882495E+00 | loss scale: 131072.0 | grad norm: 12546.824 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    56800/  152972 | consumed samples:     24001984 | elapsed time per iteration (ms): 6065.0 | learning rate: 1.546E-04 | global batch size:   512 | lm loss: 2.884147E+00 | loss scale: 131072.0 | grad norm: 12638.119 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    57000/  152972 | consumed samples:     24104384 | elapsed time per iteration (ms): 6064.6 | learning rate: 1.542E-04 | global batch size:   512 | lm loss: 2.889595E+00 | loss scale: 262144.0 | grad norm: 27110.512 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 57000 | lm loss value: 2.829968E+00 | lm loss PPL: 1.694492E+01 | 
-------------------------------------------------------------------------------------------------
saving checkpoint at iteration   57000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-01 05:55:25,432] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step57000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   57000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1535.34
 iteration    57200/  152972 | consumed samples:     24206784 | elapsed time per iteration (ms): 6943.4 | learning rate: 1.538E-04 | global batch size:   512 | lm loss: 2.885910E+00 | loss scale: 262144.0 | grad norm: 25206.756 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    57400/  152972 | consumed samples:     24309184 | elapsed time per iteration (ms): 6056.3 | learning rate: 1.535E-04 | global batch size:   512 | lm loss: 2.886223E+00 | loss scale: 524288.0 | grad norm: 51894.118 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    57600/  152972 | consumed samples:     24411584 | elapsed time per iteration (ms): 6059.5 | learning rate: 1.531E-04 | global batch size:   512 | lm loss: 2.886975E+00 | loss scale: 524288.0 | grad norm: 49056.254 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    57800/  152972 | consumed samples:     24513984 | elapsed time per iteration (ms): 6094.0 | learning rate: 1.528E-04 | global batch size:   512 | lm loss: 2.884210E+00 | loss scale: 524288.0 | grad norm: 51317.183 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-10-01 07:36:34,582] [INFO] [logging.py:68:log_dist] [Rank 0] step=58000, skipped=125, lr=[0.00015241043912439214, 0.00015241043912439214], mom=[(0.9, 0.999), (0.9, 0.999)]
 iteration    58000/  152972 | consumed samples:     24616384 | elapsed time per iteration (ms): 6070.0 | learning rate: 1.524E-04 | global batch size:   512 | lm loss: 2.893512E+00 | loss scale: 65536.0 | grad norm: 6986.211 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
steps: 58000 loss: 2.9340 iter time (s): 0.003 samples/sec: 172161.277
-------------------------------------------------------------------------------------------------
 validation loss at iteration 58000 | lm loss value: 2.837630E+00 | lm loss PPL: 1.707525E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    58200/  152972 | consumed samples:     24718784 | elapsed time per iteration (ms): 6913.0 | learning rate: 1.520E-04 | global batch size:   512 | lm loss: 2.889378E+00 | loss scale: 65536.0 | grad norm: 6568.313 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    58400/  152972 | consumed samples:     24821184 | elapsed time per iteration (ms): 6059.6 | learning rate: 1.517E-04 | global batch size:   512 | lm loss: 2.884016E+00 | loss scale: 65536.0 | grad norm: 5935.404 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration   58500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-01 08:29:55,521] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step58500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   58500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1512.72
 iteration    58600/  152972 | consumed samples:     24923584 | elapsed time per iteration (ms): 6061.9 | learning rate: 1.513E-04 | global batch size:   512 | lm loss: 3.008151E+00 | loss scale: 8192.0 | grad norm: 772.492 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    58800/  152972 | consumed samples:     25025984 | elapsed time per iteration (ms): 6036.0 | learning rate: 1.510E-04 | global batch size:   512 | lm loss: 2.892257E+00 | loss scale: 8192.0 | grad norm: 741.286 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    59000/  152972 | consumed samples:     25128384 | elapsed time per iteration (ms): 6028.3 | learning rate: 1.506E-04 | global batch size:   512 | lm loss: 2.883909E+00 | loss scale: 8192.0 | grad norm: 811.411 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 59000 | lm loss value: 2.831567E+00 | lm loss PPL: 1.697203E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    59200/  152972 | consumed samples:     25230784 | elapsed time per iteration (ms): 6939.8 | learning rate: 1.502E-04 | global batch size:   512 | lm loss: 2.883289E+00 | loss scale: 16384.0 | grad norm: 1902.272 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    59400/  152972 | consumed samples:     25333184 | elapsed time per iteration (ms): 6057.2 | learning rate: 1.499E-04 | global batch size:   512 | lm loss: 2.885146E+00 | loss scale: 16384.0 | grad norm: 1467.461 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    59600/  152972 | consumed samples:     25435584 | elapsed time per iteration (ms): 6044.0 | learning rate: 1.495E-04 | global batch size:   512 | lm loss: 2.888295E+00 | loss scale: 32768.0 | grad norm: 3353.236 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    59800/  152972 | consumed samples:     25537984 | elapsed time per iteration (ms): 6038.1 | learning rate: 1.491E-04 | global batch size:   512 | lm loss: 2.886164E+00 | loss scale: 32768.0 | grad norm: 3013.238 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-10-01 11:03:58,536] [INFO] [logging.py:68:log_dist] [Rank 0] step=60000, skipped=130, lr=[0.00014874998628833813, 0.00014874998628833813], mom=[(0.9, 0.999), (0.9, 0.999)]
 iteration    60000/  152972 | consumed samples:     25640384 | elapsed time per iteration (ms): 6041.9 | learning rate: 1.487E-04 | global batch size:   512 | lm loss: 2.884640E+00 | loss scale: 32768.0 | grad norm: 3408.356 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
steps: 60000 loss: 2.8949 iter time (s): 0.003 samples/sec: 172603.944
-------------------------------------------------------------------------------------------------
 validation loss at iteration 60000 | lm loss value: 2.838630E+00 | lm loss PPL: 1.709233E+01 | 
-------------------------------------------------------------------------------------------------
saving checkpoint at iteration   60000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-01 11:06:52,594] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step60000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   60000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1508.74
 iteration    60200/  152972 | consumed samples:     25742784 | elapsed time per iteration (ms): 6943.2 | learning rate: 1.484E-04 | global batch size:   512 | lm loss: 2.882436E+00 | loss scale: 65536.0 | grad norm: 6413.844 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    60400/  152972 | consumed samples:     25845184 | elapsed time per iteration (ms): 6048.3 | learning rate: 1.480E-04 | global batch size:   512 | lm loss: 2.881582E+00 | loss scale: 65536.0 | grad norm: 6467.754 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    60600/  152972 | consumed samples:     25947584 | elapsed time per iteration (ms): 6120.0 | learning rate: 1.476E-04 | global batch size:   512 | lm loss: 2.882003E+00 | loss scale: 131072.0 | grad norm: 13017.445 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    60800/  152972 | consumed samples:     26049984 | elapsed time per iteration (ms): 6039.3 | learning rate: 1.473E-04 | global batch size:   512 | lm loss: 2.882432E+00 | loss scale: 131072.0 | grad norm: 12026.813 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    61000/  152972 | consumed samples:     26152384 | elapsed time per iteration (ms): 6042.9 | learning rate: 1.469E-04 | global batch size:   512 | lm loss: 2.880471E+00 | loss scale: 131072.0 | grad norm: 12167.819 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 61000 | lm loss value: 2.833997E+00 | lm loss PPL: 1.701333E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    61200/  152972 | consumed samples:     26254784 | elapsed time per iteration (ms): 7066.4 | learning rate: 1.465E-04 | global batch size:   512 | lm loss: 2.880329E+00 | loss scale: 262144.0 | grad norm: 28449.801 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    61400/  152972 | consumed samples:     26357184 | elapsed time per iteration (ms): 6182.7 | learning rate: 1.461E-04 | global batch size:   512 | lm loss: 2.880880E+00 | loss scale: 262144.0 | grad norm: 24583.164 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration   61500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-01 13:42:18,548] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step61500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   61500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1513.98
 iteration    61600/  152972 | consumed samples:     26459584 | elapsed time per iteration (ms): 6138.5 | learning rate: 1.458E-04 | global batch size:   512 | lm loss: 2.882300E+00 | loss scale: 524288.0 | grad norm: 51543.514 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    61800/  152972 | consumed samples:     26561984 | elapsed time per iteration (ms): 6091.1 | learning rate: 1.454E-04 | global batch size:   512 | lm loss: 2.876559E+00 | loss scale: 524288.0 | grad norm: 50611.457 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-10-01 14:33:06,001] [INFO] [logging.py:68:log_dist] [Rank 0] step=62000, skipped=130, lr=[0.00014499565902863053, 0.00014499565902863053], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 62000 loss: 2.8756 iter time (s): 0.003 samples/sec: 172144.496
 iteration    62000/  152972 | consumed samples:     26664384 | elapsed time per iteration (ms): 6065.1 | learning rate: 1.450E-04 | global batch size:   512 | lm loss: 2.876181E+00 | loss scale: 524288.0 | grad norm: 48315.774 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 62000 | lm loss value: 2.828757E+00 | lm loss PPL: 1.692441E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    62200/  152972 | consumed samples:     26766784 | elapsed time per iteration (ms): 6956.0 | learning rate: 1.446E-04 | global batch size:   512 | lm loss: 2.874333E+00 | loss scale: 1048576.0 | grad norm: 95200.379 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    62400/  152972 | consumed samples:     26869184 | elapsed time per iteration (ms): 6104.8 | learning rate: 1.442E-04 | global batch size:   512 | lm loss: 2.872596E+00 | loss scale: 524288.0 | grad norm: 51767.732 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    62600/  152972 | consumed samples:     26971584 | elapsed time per iteration (ms): 6070.5 | learning rate: 1.439E-04 | global batch size:   512 | lm loss: 2.880605E+00 | loss scale: 131072.0 | grad norm: 15048.526 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    62800/  152972 | consumed samples:     27073984 | elapsed time per iteration (ms): 6066.6 | learning rate: 1.435E-04 | global batch size:   512 | lm loss: 2.881883E+00 | loss scale: 131072.0 | grad norm: 12174.059 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    63000/  152972 | consumed samples:     27176384 | elapsed time per iteration (ms): 6082.2 | learning rate: 1.431E-04 | global batch size:   512 | lm loss: 2.874342E+00 | loss scale: 131072.0 | grad norm: 12289.422 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 63000 | lm loss value: 2.823852E+00 | lm loss PPL: 1.684160E+01 | 
-------------------------------------------------------------------------------------------------
saving checkpoint at iteration   63000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-01 16:20:15,471] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step63000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   63000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1540.65
 iteration    63200/  152972 | consumed samples:     27278784 | elapsed time per iteration (ms): 6925.7 | learning rate: 1.427E-04 | global batch size:   512 | lm loss: 2.871830E+00 | loss scale: 262144.0 | grad norm: 24001.155 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    63400/  152972 | consumed samples:     27381184 | elapsed time per iteration (ms): 6059.5 | learning rate: 1.423E-04 | global batch size:   512 | lm loss: 2.871925E+00 | loss scale: 262144.0 | grad norm: 24171.816 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    63600/  152972 | consumed samples:     27483584 | elapsed time per iteration (ms): 6053.4 | learning rate: 1.419E-04 | global batch size:   512 | lm loss: 2.870890E+00 | loss scale: 524288.0 | grad norm: 46657.558 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    63800/  152972 | consumed samples:     27585984 | elapsed time per iteration (ms): 6059.0 | learning rate: 1.416E-04 | global batch size:   512 | lm loss: 2.872246E+00 | loss scale: 524288.0 | grad norm: 46213.626 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-10-01 18:01:14,160] [INFO] [logging.py:68:log_dist] [Rank 0] step=64000, skipped=134, lr=[0.00014117153364821304, 0.00014117153364821304], mom=[(0.9, 0.999), (0.9, 0.999)]
 iteration    64000/  152972 | consumed samples:     27688384 | elapsed time per iteration (ms): 6063.0 | learning rate: 1.412E-04 | global batch size:   512 | lm loss: 2.871957E+00 | loss scale: 524288.0 | grad norm: 48874.121 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
steps: 64000 loss: 2.8957 iter time (s): 0.003 samples/sec: 161495.960
-------------------------------------------------------------------------------------------------
 validation loss at iteration 64000 | lm loss value: 2.819027E+00 | lm loss PPL: 1.676053E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    64200/  152972 | consumed samples:     27790784 | elapsed time per iteration (ms): 6916.1 | learning rate: 1.408E-04 | global batch size:   512 | lm loss: 2.871878E+00 | loss scale: 1048576.0 | grad norm: 99040.641 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    64400/  152972 | consumed samples:     27893184 | elapsed time per iteration (ms): 6066.0 | learning rate: 1.404E-04 | global batch size:   512 | lm loss: 2.870983E+00 | loss scale: 1048576.0 | grad norm: 96766.434 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration   64500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-01 18:54:39,878] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step64500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   64500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1585.90
 iteration    64600/  152972 | consumed samples:     27995584 | elapsed time per iteration (ms): 6087.3 | learning rate: 1.400E-04 | global batch size:   512 | lm loss: 2.870136E+00 | loss scale: 1048576.0 | grad norm: 98557.685 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    64800/  152972 | consumed samples:     28097984 | elapsed time per iteration (ms): 6034.9 | learning rate: 1.396E-04 | global batch size:   512 | lm loss: 2.870005E+00 | loss scale: 524288.0 | grad norm: 51086.994 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    65000/  152972 | consumed samples:     28200384 | elapsed time per iteration (ms): 6052.6 | learning rate: 1.392E-04 | global batch size:   512 | lm loss: 2.868428E+00 | loss scale: 524288.0 | grad norm: 50395.871 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 65000 | lm loss value: 2.815728E+00 | lm loss PPL: 1.670533E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    65200/  152972 | consumed samples:     28302784 | elapsed time per iteration (ms): 6941.5 | learning rate: 1.388E-04 | global batch size:   512 | lm loss: 2.872127E+00 | loss scale: 1048576.0 | grad norm: 105698.204 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    65400/  152972 | consumed samples:     28405184 | elapsed time per iteration (ms): 6073.7 | learning rate: 1.385E-04 | global batch size:   512 | lm loss: 2.867939E+00 | loss scale: 1048576.0 | grad norm: 97437.801 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    65600/  152972 | consumed samples:     28507584 | elapsed time per iteration (ms): 6080.1 | learning rate: 1.381E-04 | global batch size:   512 | lm loss: 2.868213E+00 | loss scale: 1048576.0 | grad norm: 95743.528 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    65800/  152972 | consumed samples:     28609984 | elapsed time per iteration (ms): 6071.1 | learning rate: 1.377E-04 | global batch size:   512 | lm loss: 2.865917E+00 | loss scale: 262144.0 | grad norm: 24556.130 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-10-01 21:29:11,271] [INFO] [logging.py:68:log_dist] [Rank 0] step=66000, skipped=139, lr=[0.00013727953456626625, 0.00013727953456626625], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 66000 loss: 2.8560 iter time (s): 0.003 samples/sec: 171544.840
 iteration    66000/  152972 | consumed samples:     28712384 | elapsed time per iteration (ms): 6062.4 | learning rate: 1.373E-04 | global batch size:   512 | lm loss: 2.867659E+00 | loss scale: 262144.0 | grad norm: 25894.601 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 66000 | lm loss value: 2.818860E+00 | lm loss PPL: 1.675774E+01 | 
-------------------------------------------------------------------------------------------------
saving checkpoint at iteration   66000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-01 21:32:06,271] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step66000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   66000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1460.68
 iteration    66200/  152972 | consumed samples:     28814784 | elapsed time per iteration (ms): 6960.6 | learning rate: 1.369E-04 | global batch size:   512 | lm loss: 2.868319E+00 | loss scale: 262144.0 | grad norm: 24268.691 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration   66367 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-01 22:09:15,743] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step66367/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   66367 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1559.03
[exiting program after 1190.0553922136626 minutes] datetime: 2021-10-01 22:09:16 
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
2021-10-01 23:02:10.339225: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:10.339229: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:10.339219: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:10.339250: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:10.394676: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:10.394688: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:10.394685: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:10.394685: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:10.849715: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:10.849717: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:10.849791: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:10.849869: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:10.933427: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:10.933424: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:10.933427: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:10.933423: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:10.974083: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:10.974083: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:10.974083: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:10.974087: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.029359: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.029357: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.029365: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.029364: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.040658: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.040655: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.040660: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.040662: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.070110: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.070111: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.070112: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.070117: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.090390: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.090403: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.090398: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.090396: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.104831: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.104838: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.104838: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.104840: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.106135: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.106144: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.106139: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.106148: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.108213: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.108210: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.108215: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.108218: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.124878: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.124875: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.124875: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.124882: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.144754: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.144758: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.144767: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.144773: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.148598: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.148606: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.148605: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.148607: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.224626: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.224631: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.224627: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-01 23:02:11.224631: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja
--------------------------------------------------

JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninja
JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


op name op nameop name................op name   ................ installed................  ................ ..installed   installedinstalledcompatible..
   --------------------------------------------------compatible....

  --------------------------------------------------compatiblecompatible


----------------------------------------------------------------------------------------------------

cpu_adam ............... [92m[YES][0m cpu_adam......  ...............[92m[OKAY][0m cpu_adam[92m[YES][0m
cpu_adam   ..................... ............... [92m[OKAY][0m [92m[YES][0m
[92m[YES][0m ......  ......fused_adam[92m[OKAY][0m  .............
[92m[OKAY][0m [93m[NO][0m
 fused_adam.......  .............[92m[OKAY][0m 
[93m[NO][0m .......fused_adam fused_lambfused_adam [92m[OKAY][0m  
.......................................   [93m[NO][0mfused_lamb[93m[NO][0m [93m[NO][0m  ............. .............. ....... [93m[NO][0m  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
.......

 [92m[OKAY][0m
fused_lambfused_lamb  ..........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0msparse_attn

 sparse_attn............  ............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attnsparse_attntransformer transformer   ................................................    [93m[NO][0m[93m[NO][0m[93m[NO][0m[93m[NO][0m    ............................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


stochastic_transformerstochastic_transformertransformer   .transformer. ............  [93m[NO][0m[93m[NO][0m   ............[93m[NO][0m....... .......  [93m[NO][0m [92m[OKAY][0m.......
  [92m[OKAY][0m.......[92m[OKAY][0m
 
[92m[OKAY][0m
stochastic_transformer .stochastic_transformer  [93m[NO][0m.  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
----------------------------------------------------------------------------------------------------

--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


------------------------------------------------------------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------
JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
JIT compiled ops requires ninja


--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


----------------------------------------------------------------------------------------------------
--------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninjaJIT compiled ops requires ninja

JIT compiled ops requires ninja
JIT compiled ops requires ninja

ninjaninjaninja  .................. ..................ninja .................. [92m[OKAY][0m  
..................[92m[OKAY][0m--------------------------------------------------
[92m[OKAY][0m 

[92m[OKAY][0m--------------------------------------------------op name

-------------------------------------------------- --------------------------------------------------op name
................ 
op name................ op name  installed installed................ ................  installed.. ..   installed.. compatiblecompatible ..

--------------------------------------------------compatible 
--------------------------------------------------
compatible

--------------------------------------------------
--------------------------------------------------
cpu_adam cpu_adam............... cpu_adam [92m[YES][0mcpu_adam...............   ............... ............... ......[92m[YES][0m[92m[YES][0m  [92m[OKAY][0m 
 ......[92m[YES][0m......   [92m[OKAY][0m[92m[OKAY][0m......
 [92m[OKAY][0m

fused_adam ............. [93m[NO][0m fused_adam.......fused_adam fused_adam  .............[92m[OKAY][0m ............. .............
[93m[NO][0m   [93m[NO][0m[93m[NO][0m....... fused_lamb  ....... [92m[OKAY][0m ....................[92m[OKAY][0m
 
[93m[NO][0m  [92m[OKAY][0mfused_lamb.......fused_lamb
   [92m[OKAY][0m..........................
 fused_lamb [93m[NO][0m  [93m[NO][0m....................   .......[93m[NO][0m[92m[OKAY][0m  
[92m[OKAY][0m.......
 [92m[OKAY][0msparse_attn
 ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attnsparse_attntransformer sparse_attn ............  ............ ............ [93m[NO][0m [93m[NO][0m............ [93m[NO][0m  ....... ....... [93m[NO][0m[92m[OKAY][0m ....... 
[92m[OKAY][0m .......
transformer[92m[OKAY][0m  transformer
[92m[OKAY][0m ............
 ............[93m[NO][0m transformer stochastic_transformer [93m[NO][0m ........ ............   [92m[OKAY][0m.......
[93m[NO][0m[93m[NO][0m   [92m[OKAY][0m.............. 
stochastic_transformer [92m[OKAY][0m [92m[OKAY][0m.

stochastic_transformer  [93m[NO][0mstochastic_transformer . .......  .[92m[OKAY][0m[93m[NO][0m 
 [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------op name


 op nameop nameop name................    ................................installed ................ installed   ..installedinstalled..    compatible....compatible
 
 --------------------------------------------------compatiblecompatible--------------------------------------------------


--------------------------------------------------
--------------------------------------------------
cpu_adam ............... cpu_adam[92m[YES][0m  cpu_adam.....................cpu_adam    ..............................[92m[YES][0m[92m[OKAY][0m  
 [92m[YES][0m[92m[YES][0m......  ...... ...... [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m
fused_adam
 ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam .............fused_adamfused_lambfused_adam    [93m[NO][0m.......................................   [93m[NO][0m [93m[NO][0m....... [93m[NO][0m .......  .......[92m[OKAY][0m ....... 
[92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m
fused_lamb
 ............. [93m[NO][0mfused_lamb fused_lamb ....... ............. ............. [92m[OKAY][0m [93m[NO][0m
sparse_attn[93m[NO][0m   ..........................  [92m[OKAY][0m [92m[OKAY][0m
[93m[NO][0m
 .......sparse_attn  [92m[OKAY][0m............
 [93m[NO][0m .......transformer [92m[OKAY][0m sparse_attn
............ ............ sparse_attn[93m[NO][0m transformer  [93m[NO][0m ............ ...................  ....... [93m[NO][0m[93m[NO][0m[92m[OKAY][0m  
[92m[OKAY][0m .......
.......  transformer[92m[OKAY][0m[92m[OKAY][0m stochastic_transformer

 ............ .[93m[NO][0m transformer [93m[NO][0m ....... stochastic_transformer...................   . [93m[NO][0m [92m[OKAY][0m[92m[OKAY][0m 
[93m[NO][0m
.......  stochastic_transformer.......[92m[OKAY][0m  
.[92m[OKAY][0m 
[93m[NO][0m .......stochastic_transformer  [92m[OKAY][0m.
 [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report
--------------------------------------------------
JIT compiled ops requires ninja


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninja
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------

JIT compiled ops requires ninja
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja
--------------------------------------------------

DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

DeepSpeed C++/CUDA extension op report--------------------------------------------------

JIT compiled ops requires ninja--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


op nameop nameop nameop name    ................................................................    installedinstalledinstalledinstalled    ...... ..  compatible compatiblecompatible
compatible

--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------

cpu_adam cpu_adam............... cpu_adam ...............cpu_adam [92m[YES][0m  ...............[92m[YES][0m ...............   ......[92m[YES][0m......[92m[YES][0m    [92m[OKAY][0m[92m[OKAY][0m............

  [92m[OKAY][0m[92m[OKAY][0m

fused_adam fused_adam.............fused_adam fused_adam [93m[NO][0m ............. ....................    .............[93m[NO][0m[93m[NO][0m [92m[OKAY][0m  .......
.......[93m[NO][0m   [92m[OKAY][0m[92m[OKAY][0mfused_lamb.......
 
 [92m[OKAY][0mfused_lamb............. 
 .............fused_lamb[93m[NO][0m   fused_lamb.................... [93m[NO][0m  [93m[NO][0m ....................[92m[OKAY][0m   
.......[93m[NO][0m[92m[OKAY][0m  
[92m[OKAY][0m.......
 [92m[OKAY][0m
sparse_attnsparse_attnsparse_attn  ............ ............ ............sparse_attn  [93m[NO][0m [93m[NO][0m[93m[NO][0m  ................... ....... .......  [93m[NO][0m [92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m


....... transformertransformer[92m[OKAY][0m 
 transformer............ ............transformer [93m[NO][0m............   ....... [93m[NO][0m ............[93m[NO][0m  [93m[NO][0m[92m[OKAY][0m ..............  
....... [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

stochastic_transformer stochastic_transformer.stochastic_transformer  stochastic_transformer[93m[NO][0m .  . .......  .[93m[NO][0m[93m[NO][0m[92m[OKAY][0m  
....... [93m[NO][0m ....... [92m[OKAY][0m .......
[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------
DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report

------------------------------------------------------------------------------------------------------------------------------------------------------


JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


----------------------------------------------------------------------------------------------------
JIT compiled ops requires ninja

JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
DeepSpeed C++/CUDA extension op report--------------------------------------------------

JIT compiled ops requires ninja--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op report--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    .................................... .................. ..................[92m[OKAY][0m  [92m[OKAY][0m
[92m[OKAY][0m
[92m[OKAY][0m--------------------------------------------------

--------------------------------------------------
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------

--------------------------------------------------op nameop name
------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------
op name  op name ................................  ................ ................installed installed   installed..installed..  compatible  ..
..compatible -------------------------------------------------- 

compatiblecompatible--------------------------------------------------


----------------------------------------------------------------------------------------------------

op nameop name
 op name ................op name ................   ................installed................installed    installed..installed..  compatible  ..compatible
 ..--------------------------------------------------compatible
 

--------------------------------------------------compatible--------------------------------------------------


cpu_adam cpu_adam...............  cpu_adam[92m[YES][0mcpu_adam...............    ....................................[92m[YES][0m    [92m[YES][0m......[92m[YES][0m [92m[OKAY][0m 
......[92m[OKAY][0m  ......
[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
ninjaninjaninja   ninja......................................................    ..................[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0m

fused_adam .............fused_adam [93m[NO][0m  fused_adam.............fused_adam.......  .............   [93m[NO][0m[93m[NO][0m[92m[OKAY][0m.............
cpu_adam ............... [92m[YES][0mcpu_adam cpu_adam......cpu_adam [92m[OKAY][0m   
.............................................   [92m[YES][0m[92m[YES][0m[92m[YES][0m   ..................   [92m[OKAY][0mfused_adam[92m[OKAY][0m[92m[OKAY][0m 


............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


   ..............fused_lamb  [92m[OKAY][0m[93m[NO][0m [92m[OKAY][0m
 .............
op nameop nameop nameop name    ................................................  ................ installedinstalledinstalled    installed......   compatible compatible
..compatible
--------------------------------------------------
 --------------------------------------------------
compatible
--------------------------------------------------

--------------------------------------------------
.......  [93m[NO][0mfused_lamb[92m[OKAY][0mfused_lamb   
fused_adamfused_adamfused_lambfused_adam   ............. .......................................    [93m[NO][0m[93m[NO][0m [93m[NO][0m[93m[NO][0m .......  ....... .............. [92m[OKAY][0m  [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


.................................  [92m[OKAY][0m 
fused_lamb[93m[NO][0m[93m[NO][0m   ...........................   [93m[NO][0m[92m[OKAY][0m [92m[OKAY][0m
.......
cpu_adam ...............cpu_adam  cpu_adam[92m[YES][0m...............   .....................cpu_adam[92m[YES][0m   [92m[OKAY][0m ...............[92m[YES][0m
 ......  [92m[YES][0m......[92m[OKAY][0m  
[92m[OKAY][0m......
 [92m[OKAY][0mfused_adam
 ............. [93m[NO][0m .......fused_adam  [92m[OKAY][0m.............
 sparse_attn[92m[OKAY][0m 
fused_lamb .............fused_lamb fused_lamb [93m[NO][0m ............. ............. ....... sparse_attn[93m[NO][0m [93m[NO][0m [92m[OKAY][0m 
............ [93m[NO][0m ....... [92m[OKAY][0m
 fused_adamfused_adam[93m[NO][0m  fused_lamb .................................    .............[93m[NO][0m[92m[OKAY][0m [93m[NO][0m
[93m[NO][0m   .....................fused_lamb    [92m[OKAY][0m.............[92m[OKAY][0m
 ..........................  [93m[NO][0m  [92m[OKAY][0m[92m[OKAY][0m.......
sparse_attn
 [92m[OKAY][0m
[93m[NO][0m
 ....... [92m[OKAY][0mfused_lamb
  [92m[OKAY][0m............
 [93m[NO][0m .......transformer  [92m[OKAY][0m............
sparse_attn ............sparse_attn transformer [93m[NO][0msparse_attn  ............ ............  ...................[93m[NO][0m[93m[NO][0m   ....... [93m[NO][0m....... [92m[OKAY][0m  
[92m[OKAY][0m.......[92m[OKAY][0m
 
fused_lamb  ..........................  [93m[NO][0m[93m[NO][0m sparse_attn ....... ....... ............ [92m[OKAY][0msparse_attn[92m[OKAY][0m 
 
[92m[OKAY][0mtransformer
 [93m[NO][0m transformer.......sparse_attn   sparse_attn........................ [92m[OKAY][0m  [93m[NO][0m
[93m[NO][0m............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
 transformer............ stochastic_transformer transformer[93m[NO][0m............    ....................[93m[NO][0m    [92m[OKAY][0m[93m[NO][0m[93m[NO][0m 
[93m[NO][0m............  .......  .......[92m[OKAY][0mstochastic_transformer[93m[NO][0m
   [92m[OKAY][0m........stochastic_transformer 
....... ....... .......stochastic_transformer [92m[OKAY][0m  
.[92m[OKAY][0m[92m[OKAY][0m 
[93m[NO][0m
transformersparse_attnsparse_attn   transformer....................................   ............  [93m[NO][0m[93m[NO][0m  .......[93m[NO][0m[93m[NO][0m .......   [92m[OKAY][0m..............
 [92m[OKAY][0m [92m[OKAY][0m
[93m[NO][0m  [92m[OKAY][0m .......transformer
  [92m[OKAY][0m.transformer............
 stochastic_transformer.......  .stochastic_transformer[92m[OKAY][0m  
[93m[NO][0m.  .......[93m[NO][0m  [92m[OKAY][0m.......
stochastic_transformer[92m[OKAY][0m
 
 [92m[OKAY][0m
   [93m[NO][0m............[93m[NO][0m  ....... [93m[NO][0m  .......[92m[OKAY][0m....... 
 [92m[OKAY][0m[92m[OKAY][0m

.transformer stochastic_transformer[93m[NO][0m  transformer ............. ....... ............  [93m[NO][0m[93m[NO][0m[92m[OKAY][0m   
.......[93m[NO][0m.......   [92m[OKAY][0m.......[92m[OKAY][0m
 
stochastic_transformerstochastic_transformer  ..  [93m[NO][0m[93m[NO][0m .......  .......[92m[OKAY][0m 
[92m[OKAY][0m
stochastic_transformer .stochastic_transformer  [93m[NO][0m.  .......[93m[NO][0m  [92m[OKAY][0m.......
[92m[OKAY][0m
 [92m[OKAY][0m
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

------------------------------------------------------------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report
--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------


DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report--------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninja
--------------------------------------------------
JIT compiled ops requires ninja

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninjaninja   .................. .................................... ..................  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m

--------------------------------------------------

----------------------------------------------------------------------------------------------------
op name--------------------------------------------------

op name op name ................op name ................  ................ ................installed  installedinstalled installed   ........    compatiblecompatiblecompatiblecompatible


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


cpu_adamcpu_adamcpu_adamcpu_adam   ............... .............................. ............... [92m[YES][0m   [92m[YES][0m[92m[YES][0m......   [92m[YES][0m............ [92m[OKAY][0m  [92m[OKAY][0m
[92m[OKAY][0m......

 [92m[OKAY][0m
fused_adam fused_adam.............  .............[93m[NO][0m  fused_adamfused_adam.......[93m[NO][0m   .......................... [92m[OKAY][0m  .......[93m[NO][0m[93m[NO][0m
   [92m[OKAY][0m.......fused_lamb.......
   [92m[OKAY][0m.............fused_lamb
[92m[OKAY][0m  
[93m[NO][0m.............  .......[93m[NO][0mfused_lambfused_lamb    ....................[92m[OKAY][0m.............  
 [92m[OKAY][0m[93m[NO][0m[93m[NO][0m
  ..............  [92m[OKAY][0m
[92m[OKAY][0m
sparse_attn ............ sparse_attn[93m[NO][0m  ...................  [93m[NO][0m[92m[OKAY][0m 
....... sparse_attn[92m[OKAY][0mtransformer sparse_attn
 ........................  ............transformer[93m[NO][0m    [93m[NO][0m[93m[NO][0m...................  ....... .......  [93m[NO][0m[92m[OKAY][0m [92m[OKAY][0m 

.......[92m[OKAY][0m 
transformer[92m[OKAY][0mtransformer
 stochastic_transformer ............ ............  .[93m[NO][0mstochastic_transformer [93m[NO][0m [93m[NO][0m  .......  ........[92m[OKAY][0m ....... [92m[OKAY][0m
 [93m[NO][0m
[92m[OKAY][0m 
....... [92m[OKAY][0mstochastic_transformer
stochastic_transformer  ..  [93m[NO][0m [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report

--------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------

----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

ninjaninjaninjaninja   .................. ......................................................   [92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------

op nameop name  op name................................   installed................  ..
installed installed compatible ..op name
..  -------------------------------------------------- compatible
................compatible
 
--------------------------------------------------installed--------------------------------------------------
 
.. cpu_adamcompatible 
...............-------------------------------------------------- cpu_adamcpu_adam
[92m[YES][0m   ....................................   [92m[YES][0mcpu_adam[92m[OKAY][0m [92m[YES][0m
......   ......[92m[OKAY][0m............... 
 [92m[OKAY][0m[92m[YES][0m
 fused_adam......  ............. [92m[OKAY][0m[93m[NO][0mfused_adam
 fused_adam ....................   [93m[NO][0m.............[92m[OKAY][0m  
.......fused_adam[93m[NO][0m   [92m[OKAY][0mfused_lamb.............
 ....... .............  [93m[NO][0m[92m[OKAY][0m[93m[NO][0m fused_lamb
 ....... ....................  fused_lamb[92m[OKAY][0m [92m[OKAY][0m
 [93m[NO][0m
............. .......  fused_lamb[93m[NO][0m[92m[OKAY][0m  
....................  [93m[NO][0m[92m[OKAY][0m sparse_attn
.......  ............ [92m[OKAY][0m[93m[NO][0m
sparse_attn  ................... [92m[OKAY][0m sparse_attn
[93m[NO][0m  ...................transformer   [93m[NO][0m[92m[OKAY][0msparse_attn............
   .......transformer............[93m[NO][0m    ............[92m[OKAY][0m[93m[NO][0m.......
   [92m[OKAY][0m.......[93m[NO][0mtransformer 
  .......[92m[OKAY][0m ............
stochastic_transformer [92m[OKAY][0m transformer.
[93m[NO][0m   [93m[NO][0m.......stochastic_transformer............  [92m[OKAY][0m  ........
[93m[NO][0m  [92m[OKAY][0m[93m[NO][0m 
 stochastic_transformer..............   [92m[OKAY][0m.[92m[OKAY][0m
 
[93m[NO][0m stochastic_transformer.......  [92m[OKAY][0m.
 [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0mtransformer_inference
 .. [93m[NO][0m ....... [92m[OKAY][0m
utilstransformer_inference  ....................  [92m[YES][0m[93m[NO][0m  .............  [92m[OKAY][0m[92m[OKAY][0m

quantizer .............. utils[93m[NO][0m  .........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
--------------------------------------------------quantizer
 .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninja   ......................................................  ninja [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
 

..................---------------------------------------------------------------------------------------------------- --------------------------------------------------

[92m[OKAY][0m
op nameop name
op name  --------------------------------------------------................................
   ................installedop nameinstalled    ....................installed    compatibleinstalledcompatible..
 
 ..----------------------------------------------------------------------------------------------------
compatible 

compatible--------------------------------------------------

--------------------------------------------------
cpu_adamcpu_adam  ...............cpu_adam  [92m[YES][0m...............cpu_adam   .....................[92m[YES][0m...............    [92m[YES][0m[92m[YES][0m[92m[OKAY][0m......  ......
  [92m[OKAY][0m......[92m[OKAY][0m

 [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... fused_adamfused_adam[92m[OKAY][0m  
.............fused_adam.............   .............[93m[NO][0mfused_lamb[93m[NO][0m    .................... .......[93m[NO][0m[92m[OKAY][0m  
 [92m[OKAY][0m.......[93m[NO][0m
 fused_lamb  [92m[OKAY][0mfused_lamb....................
   .............fused_lamb[92m[OKAY][0m [93m[NO][0m
[93m[NO][0m   ...........................  [92m[OKAY][0m [93m[NO][0m
[92m[OKAY][0m
sparse_attn  ................... [92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0msparse_attn
sparse_attn ............  transformer............[93m[NO][0m   [93m[NO][0m...................   .......[93m[NO][0m[92m[OKAY][0m
  sparse_attn[92m[OKAY][0m....... 
transformer  [92m[OKAY][0m........................transformer
   [93m[NO][0m[93m[NO][0m............stochastic_transformer  ....... [93m[NO][0m  . .......[92m[OKAY][0m ....... [93m[NO][0m[92m[OKAY][0m
  
[92m[OKAY][0m.......stochastic_transformer 
 [92m[OKAY][0m.transformer
 stochastic_transformer [93m[NO][0m  ................... . [92m[OKAY][0m 
[93m[NO][0m[93m[NO][0m  ....... .......[92m[OKAY][0m [92m[OKAY][0m

stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninjaninjaninjaninja  ..................   ......................................................[92m[OKAY][0m   [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------

op nameop nameop name op name ................  ................................   ................installedinstalled installed  installed ....  ....  compatible compatiblecompatible
compatible
--------------------------------------------------

------------------------------------------------------------------------------------------------------------------------------------------------------


cpu_adamcpu_adamcpu_adam cpu_adam  .............................. ...............   ............... [92m[YES][0m[92m[YES][0m[92m[YES][0m[92m[YES][0m   ...... ..................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


fused_adamfused_adamfused_adamfused_adam    ....................................................    [93m[NO][0m[93m[NO][0m[93m[NO][0m[93m[NO][0m    ............................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


fused_lambfused_lambfused_lambfused_lamb   ..........................   ..........................[93m[NO][0m[93m[NO][0m    .......[93m[NO][0m.......[93m[NO][0m    [92m[OKAY][0m[92m[OKAY][0m.......
.......
  [92m[OKAY][0m[92m[OKAY][0m

sparse_attn ............sparse_attn sparse_attn [93m[NO][0msparse_attn  ........................ ....... ............ [93m[NO][0m  [93m[NO][0m[92m[OKAY][0m 
[93m[NO][0m ....... ....... .......transformer [92m[OKAY][0m  [92m[OKAY][0m............
[92m[OKAY][0m
 
[93m[NO][0m transformer.......transformer   transformer............[92m[OKAY][0m............  
 ............[93m[NO][0m[93m[NO][0m   [93m[NO][0m.......stochastic_transformer.......   [92m[OKAY][0m[92m[OKAY][0m .......
. 
 [92m[OKAY][0m[93m[NO][0mstochastic_transformer
 stochastic_transformer ....... .  [92m[OKAY][0m.stochastic_transformer[93m[NO][0m
   [93m[NO][0m........   .......[93m[NO][0m[92m[OKAY][0m  
[92m[OKAY][0m.......
 [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m .......transformer_inference  [93m[NO][0m
.. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inferenceutils  ....................  [93m[NO][0m[92m[YES][0m  ............. [92m[OKAY][0m 
[92m[OKAY][0m
quantizerutils  ................................  [93m[NO][0m[92m[YES][0m  .............  [92m[OKAY][0m[92m[OKAY][0m

quantizer-------------------------------------------------- 
.............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m [93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`........ [93m[NO][0m

transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0mutils  .........................  [93m[NO][0m[92m[YES][0m
 ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0mtransformer_inference  .........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_iotransformer_inference  .................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[92m[OKAY][0m

utils .................. [92m[YES][0mtransformer_inference  ........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m .......utils [92m[OKAY][0m 
.................. [92m[YES][0m ...... [92m[OKAY][0m--------------------------------------------------

quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
----------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report
--------------------------------------------------


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------JIT compiled ops requires ninja


JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja


JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.async_io
 ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference ..async_io [93m[NO][0m  ......................  [93m[NO][0m[92m[OKAY][0m 
....... [93m[NO][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
transformer_inference .. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
utils --------------------------------------------------..................
 [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io async_io...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
----------------------------------------------------------------------------------------------------

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m ....... async_io[93m[NO][0m
 ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference utils..  ..................[93m[NO][0m  [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------.......
 [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_iotransformer_inference  .................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[93m[NO][0m

utils .................. [92m[YES][0m ......transformer_inference  [92m[OKAY][0m..
 [93m[NO][0m .......quantizer  [92m[OKAY][0m.............. 
[93m[NO][0m ....... [92m[OKAY][0m
utils .................. --------------------------------------------------[92m[YES][0m
 ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja--------------------------------------------------

DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inference .. [93m[NO][0mtransformer_inference  .........  [92m[OKAY][0m
[93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m utils......  ..................[92m[OKAY][0m 
[92m[YES][0m ......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m ....... quantizer[92m[OKAY][0m 
.............. [93m[NO][0m ....... --------------------------------------------------[92m[OKAY][0m

--------------------------------------------------
ninjaninjaninja  ninja ......................................................    ..................[92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

op name

 op nameop name................op name    ................................installed................  installed  installed..installed    ...... compatible  compatible
compatiblecompatible
--------------------------------------------------


------------------------------------------------------------------------------------------------------------------------------------------------------


cpu_adamcpu_adam cpu_adamcpu_adam ...............  .............................. ...............   [92m[YES][0m[92m[YES][0m[92m[YES][0m [92m[YES][0m ......  ............ ......  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


fused_adamfused_adamfused_adam fused_adam.............    .............[93m[NO][0m..........................   .......  [93m[NO][0m[93m[NO][0m [93m[NO][0m[92m[OKAY][0m  .............. 
.......[92m[OKAY][0m  fused_lamb
[92m[OKAY][0m[92m[OKAY][0m 

.............fused_lamb  [93m[NO][0m............. fused_lambfused_lamb .......  [93m[NO][0m.............  .............[92m[OKAY][0m .......
 [93m[NO][0m [93m[NO][0m [92m[OKAY][0m .......
.......  [92m[OKAY][0m[92m[OKAY][0m

sparse_attn ............ [93m[NO][0m .......sparse_attn sparse_attn [92m[OKAY][0m............sparse_attn 
 [93m[NO][0m ............ transformer............ .......   [93m[NO][0m[92m[OKAY][0m............[93m[NO][0m  
 .......[93m[NO][0m.......   transformer.......[92m[OKAY][0m[92m[OKAY][0m

  ............transformer[92m[OKAY][0m 
transformer [93m[NO][0m ............ ............ stochastic_transformer[93m[NO][0m ....... .   [93m[NO][0m[92m[OKAY][0m[93m[NO][0m.......   
.............. [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0mstochastic_transformer
 
.stochastic_transformer  stochastic_transformer[93m[NO][0m.   .......[93m[NO][0m.   [92m[OKAY][0m.......[93m[NO][0m
  .......[92m[OKAY][0m 
[92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path torch install path...............  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version torch version....................  ....................1.8.1 
1.8.1
torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ...................0.4.2+72ce55a, 72ce55a, big-science 
0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
ninjaninjaninjaninja    ...................................................... ..................  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m

[92m[OKAY][0m
--------------------------------------------------

op name ................ --------------------------------------------------installed---------------------------------------------------------------------------------------------------- 

..
op nameop name op name compatible 
................ ................-------------------------------------------------- ................
 installed installed installed .. .. .. compatiblecpu_adamcompatible  

...............compatible ----------------------------------------------------------------------------------------------------

[92m[YES][0m
 --------------------------------------------------......
 [92m[OKAY][0m
cpu_adamcpu_adam  cpu_adam..............................  fused_adam ...............[92m[YES][0m[92m[YES][0m   ............. [92m[YES][0m ............[93m[NO][0m    ......[92m[OKAY][0m.......
[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m

fused_lamb ............. [93m[NO][0mfused_adam fused_adam ....... fused_adam .......................... [92m[OKAY][0m [93m[NO][0m 
 .............[93m[NO][0m ....... [93m[NO][0m.......   [92m[OKAY][0m.......[92m[OKAY][0m
 
sparse_attn[92m[OKAY][0m fused_lamb............
  fused_lamb.............[93m[NO][0m fused_lamb  ............. [93m[NO][0m....................   [93m[NO][0m[92m[OKAY][0m  
.......[93m[NO][0m .......transformer [92m[OKAY][0m  
.......[92m[OKAY][0m............
  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
sparse_attnstochastic_transformer  sparse_attn.............   ............sparse_attn[93m[NO][0m[93m[NO][0m    [93m[NO][0m................... ..............  [93m[NO][0m   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m.......

 
[92m[OKAY][0mtransformer
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer  ............transformer............   [93m[NO][0m............ [93m[NO][0m ....... [93m[NO][0m .......[92m[OKAY][0m  
[92m[OKAY][0m.......
 [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformerstochastic_transformer stochastic_transformer .  ..[93m[NO][0m   [93m[NO][0m[93m[NO][0m.......   .......[92m[OKAY][0m....... 
 [92m[OKAY][0m[92m[OKAY][0m

utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.utils ..................
 [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
async_io-------------------------------------------------- 
............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:DeepSpeed general environment info:

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

utils .................. [92m[YES][0m ...... [92m[OKAY][0m
torch cuda versiontorch cuda version  ..............................  11.111.1

quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
nvcc versionnvcc version  ..........................................  11.211.2

--------------------------------------------------
deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report


JIT compiled ops requires ninja--------------------------------------------------
DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja
------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------
DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report


--------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------


JIT compiled ops requires ninjaJIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninjaninjaninjaninja    ...................................................... ..................  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------


--------------------------------------------------op nameop name
op name  op name ................................................    installedinstalled................ installed  .. .. installed.. compatible compatible 
..compatible
-------------------------------------------------- 

--------------------------------------------------compatible--------------------------------------------------


--------------------------------------------------
cpu_adam ............... [92m[YES][0mcpu_adam  cpu_adamcpu_adam...... ............... ...............  ............... [92m[OKAY][0m[92m[YES][0m [92m[YES][0m
  [92m[YES][0m............   [92m[OKAY][0m......[92m[OKAY][0m
 
[92m[OKAY][0mfused_adam
 ............. [93m[NO][0m .......fused_adam fused_adamfused_adam   [92m[OKAY][0m.......................................
   [93m[NO][0m[93m[NO][0m[93m[NO][0m  fused_lamb ....... .............. .............  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m

[93m[NO][0m
 .......fused_lambfused_lamb  fused_lamb.............[92m[OKAY][0m   
..........................[93m[NO][0m  [93m[NO][0m [93m[NO][0m ....... ....... ....... [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m
sparse_attn
 ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer sparse_attnsparse_attn............sparse_attn    ............[93m[NO][0m........................   [93m[NO][0m [93m[NO][0m .......[93m[NO][0m ....... .......  [92m[OKAY][0m[92m[OKAY][0m .......

[92m[OKAY][0m [92m[OKAY][0m
transformerstochastic_transformer
  ............transformertransformer.    [93m[NO][0m........................[93m[NO][0m    ..............[93m[NO][0m [93m[NO][0m   [92m[OKAY][0m[92m[OKAY][0m..............
 
 [92m[OKAY][0m[92m[OKAY][0m

stochastic_transformer .stochastic_transformer stochastic_transformer [93m[NO][0m . .  .......[93m[NO][0m[93m[NO][0m   [92m[OKAY][0m..............
  [92m[OKAY][0m[92m[OKAY][0m

DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch version .................... 1.8.1
torch version torch cuda version....................  ...............1.8.1 
11.1
torch cuda versionnvcc version  ....................................  11.211.1

deepspeed install path nvcc version...........  .....................['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info  11.2...................
 deepspeed install path0.4.2+72ce55a, 72ce55a, big-science 
...........deepspeed wheel compiled w.  ......['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
torch 1.8, cuda 11.1
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-scienceDeepSpeed general environment info:
deepspeed wheel compiled w. ......
 torch 1.8, cuda 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. utils[93m[NO][0m  .........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
quantizer utils..............  ..................[93m[NO][0m  [92m[YES][0m.......  ......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------quantizer
 .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_ioasync_io ...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer .............. [93m[NO][0mquantizer  .....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------

op name
 op nameop name................op name    ................................installed ................  installedinstalled ..  installed.. ..   compatible..
compatiblecompatible --------------------------------------------------


compatible----------------------------------------------------------------------------------------------------


--------------------------------------------------
cpu_adam ...............cpu_adam cpu_adam cpu_adam[92m[YES][0m   ...................................................    [92m[OKAY][0m[92m[YES][0m[92m[YES][0m [92m[YES][0m
......   ......[92m[OKAY][0m...... 
 [92m[OKAY][0m[92m[OKAY][0m

fused_adam ............. [93m[NO][0mfused_adam  ....................  [92m[OKAY][0mfused_adam[93m[NO][0m 
fused_adam ....................   fused_lamb.............[92m[OKAY][0m[93m[NO][0m  .............
 [93m[NO][0m [93m[NO][0m.......fused_lamb    ..............[92m[OKAY][0m.............   [92m[OKAY][0m
[93m[NO][0m[92m[OKAY][0m
 
.......fused_lamb  [92m[OKAY][0mfused_lamb.............
  [93m[NO][0m.............  sparse_attn.......[93m[NO][0m   ............[92m[OKAY][0m.......  [93m[NO][0m
sparse_attn [92m[OKAY][0m .......
............  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0mtransformer
 sparse_attn............transformer   sparse_attn[93m[NO][0m............ ............   ................... [93m[NO][0m[93m[NO][0m [92m[OKAY][0m  [93m[NO][0m
..............  stochastic_transformer[92m[OKAY][0m  .......
[92m[OKAY][0m.  
[92m[OKAY][0mtransformer[93m[NO][0m 
stochastic_transformer ............ .......transformer.    ............[93m[NO][0m[93m[NO][0m [92m[OKAY][0m  .......
[93m[NO][0m.......   [92m[OKAY][0m.......[92m[OKAY][0m
 
[92m[OKAY][0m
stochastic_transformerstochastic_transformer  ..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inference .. transformer_inference[93m[NO][0m  .........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer .............. [93m[NO][0mquantizer  .....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... transformer_inference[93m[NO][0m 
.. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference utils..  ..................[93m[NO][0m  [92m[YES][0m.......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer .............. utils[93m[NO][0m  .........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inference .. [93m[NO][0m transformer_inference.......  ..[92m[OKAY][0m 
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ......utils  [92m[OKAY][0m..................
 [92m[YES][0m ...... [92m[OKAY][0mquantizer
 .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
using world size: 64, data-parallel-size: 4, tensor-model-parallel size: 4, pipeline-model-parallel size: 4 
using torch.float16 for parameters ...
------------------------ arguments ------------------------
  accumulate_allreduce_grads_in_fp32 .............. False
  adam_beta1 ...................................... 0.9
  adam_beta2 ...................................... 0.999
  adam_eps ........................................ 1e-08
  adlr_autoresume ................................. False
  adlr_autoresume_interval ........................ 1000
  apply_query_key_layer_scaling ................... True
  apply_residual_connection_post_layernorm ........ False
  attention_dropout ............................... 0.1
  attention_softmax_in_fp32 ....................... False
  bert_binary_head ................................ True
  bert_load ....................................... None
  bf16 ............................................ False
  bias_dropout_fusion ............................. True
  bias_gelu_fusion ................................ True
  biencoder_projection_dim ........................ 0
  biencoder_shared_query_context_model ............ False
  block_data_path ................................. None
  checkpoint_activations .......................... True
  checkpoint_in_cpu ............................... False
  checkpoint_num_layers ........................... 1
  clip_grad ....................................... 1.0
  codecarbon_dir .................................. None
  consumed_train_samples .......................... 0
  consumed_valid_samples .......................... 0
  contigious_checkpointing ........................ False
  cpu_optimizer ................................... False
  cpu_torch_adam .................................. False
  data_impl ....................................... mmap
  data_parallel_size .............................. 4
  data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document']
  dataloader_type ................................. single
  DDP_impl ........................................ local
  decoder_seq_length .............................. None
  deepscale ....................................... False
  deepscale_config ................................ None
  deepspeed ....................................... True
  deepspeed_activation_checkpointing .............. True
  deepspeed_config ................................ ./ds_config.1345902.json
  deepspeed_mpi ................................... False
  distribute_checkpointed_activations ............. False
  distributed_backend ............................. nccl
  embedding_path .................................. None
  encoder_seq_length .............................. 2048
  eod_mask_loss ................................... False
  eval_interval ................................... 1000
  eval_iters ...................................... 100
  evidence_data_path .............................. None
  exit_duration_in_mins ........................... 1190
  exit_interval ................................... None
  ffn_hidden_size ................................. 8192
  finetune ........................................ False
  fp16 ............................................ True
  fp16_lm_cross_entropy ........................... False
  fp32_residual_connection ........................ False
  gigaflos_no_embeds .............................. 0
  global_batch_size ............................... 512
  glu_activation .................................. None
  hidden_dropout .................................. 0.1
  hidden_size ..................................... 2048
  hysteresis ...................................... 2
  ict_head_size ................................... None
  ict_load ........................................ None
  img_dim ......................................... 224
  indexer_batch_size .............................. 128
  indexer_log_interval ............................ 1000
  init_method_std ................................. 0.02
  init_method_xavier_uniform ...................... False
  initial_loss_scale .............................. 4294967296
  kv_channels ..................................... 128
  layernorm_epsilon ............................... 1e-05
  lazy_mpu_init ................................... None
  load ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
  local_rank ...................................... 0
  log_batch_size_to_tensorboard ................... True
  log_interval .................................... 200
  log_learning_rate_to_tensorboard ................ True
  log_loss_scale_to_tensorboard ................... True
  log_num_zeros_in_grad ........................... False
  log_params_norm ................................. False
  log_timers_to_tensorboard ....................... True
  log_validation_ppl_to_tensorboard ............... True
  loss_on_targets_only ............................ False
  loss_scale ...................................... 12.0
  loss_scale_window ............................... 1000
  lr .............................................. 0.0002
  lr_decay_iters .................................. None
  lr_decay_samples ................................ 73242187
  lr_decay_style .................................. cosine
  lr_warmup_fraction .............................. None
  lr_warmup_iters ................................. 0
  lr_warmup_samples ............................... 183105
  make_vocab_size_divisible_by .................... 128
  mask_prob ....................................... 0.15
  masked_softmax_fusion ........................... True
  max_position_embeddings ......................... None
  memory_centric_tiled_linear ..................... False
  merge_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-merges.txt
  micro_batch_size ................................ 8
  min_loss_scale .................................. 1.0
  min_lr .......................................... 1e-05
  mmap_warmup ..................................... False
  no_load_optim ................................... None
  no_load_rng ..................................... None
  no_save_optim ................................... None
  no_save_rng ..................................... None
  no_train ........................................ None
  num_attention_heads ............................. 16
  num_channels .................................... 3
  num_classes ..................................... 1000
  num_layers ...................................... 24
  num_layers_per_virtual_pipeline_stage ........... None
  num_workers ..................................... 2
  onnx_safe ....................................... None
  openai_gelu ..................................... False
  optimizer ....................................... adam
  override_lr_scheduler ........................... False
  params_dtype .................................... torch.float16
  partition_activations ........................... False
  patch_dim ....................................... 16
  pipeline_model_parallel_size .................... 4
  position_embedding_type ......................... PositionEmbeddingType.rotary
  profile_backward ................................ False
  query_in_block_prob ............................. 0.1
  rampup_batch_size ............................... ['32', '32', '2_000_000']
  rank ............................................ 0
  remote_device ................................... none
  reset_attention_mask ............................ False
  reset_position_ids .............................. False
  retriever_report_topk_accuracies ................ []
  retriever_score_scaling ......................... False
  retriever_seq_length ............................ 256
  sample_rate ..................................... 1.0
  save ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
  save_interval ................................... 1500
  scatter_gather_tensors_in_pipeline .............. True
  scattered_embeddings ............................ False
  seed ............................................ 1234
  seq_length ...................................... 2048
  sgd_momentum .................................... 0.9
  short_seq_prob .................................. 0.1
  split ........................................... 949,50,1
  split_transformers .............................. False
  synchronize_each_layer .......................... False
  tensor_model_parallel_size ...................... 4
  tensorboard_dir ................................. /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/tr4c-1B3-rotary-oscar-logs/tensorboard
  tensorboard_log_interval ........................ 1
  tensorboard_queue_size .......................... 5
  tile_factor ..................................... 1
  titles_data_path ................................ None
  tokenizer_name_or_path .......................... None
  tokenizer_type .................................. GPT2BPETokenizer
  train_iters ..................................... None
  train_samples ................................... 73242187
  use_checkpoint_lr_scheduler ..................... False
  use_contiguous_buffers_in_ddp ................... False
  use_cpu_initialization .......................... None
  use_one_sent_docs ............................... False
  use_pin_memory .................................. False
  virtual_pipeline_model_parallel_size ............ None
  vocab_extra_ids ................................. 0
  vocab_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-vocab.json
  weight_decay .................................... 0.1
  world_size ...................................... 64
  zero_allgather_bucket_size ...................... 0.0
  zero_contigious_gradients ....................... False
  zero_reduce_bucket_size ......................... 0.0
  zero_reduce_scatter ............................. False
  zero_stage ...................................... 1
-------------------- end of arguments ---------------------
will use batch size rampup starting from global batch size 32 to global batch size 512 with batch size increments 32 over 2000000 samples.
> building GPT2BPETokenizer tokenizer ...
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io ...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m .......async_io  [93m[NO][0m...............
 [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m transformer_inference.......  ..[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... utils[92m[OKAY][0m 
.................. [92m[YES][0mquantizer  ....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m-------------------------------------------------- 
....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 
.................... 1.8.1torch version
 .................... torch cuda version1.8.1 
............... torch cuda version11.1 
...............nvcc version  11.1.....................
 11.2nvcc version
 .....................deepspeed install path  11.2...........
 deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
...........deepspeed info  ...................['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed info
 deepspeed wheel compiled w....................  ......0.4.2+72ce55a, 72ce55a, big-science 
torch 1.8, cuda 11.1
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... utils[92m[OKAY][0m 
.................. [92m[YES][0m ...... utils[92m[OKAY][0m 
.................. [92m[YES][0m ......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m ....... [92m[OKAY][0mquantizer
 .............. [93m[NO][0m ....... --------------------------------------------------[92m[OKAY][0m

--------------------------------------------------
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------DeepSpeed C++/CUDA extension op report


----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja--------------------------------------------------


JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path 
............... torch install path ...............['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 
torch version .................... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']1.8.1

torch cuda versiontorch version  ...................................  11.11.8.1

nvcc version .....................torch cuda version  11.2...............
 deepspeed install path11.1 
...........nvcc version  .....................['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
11.2deepspeed info
 deepspeed install path...................  ...........0.4.2+72ce55a, 72ce55a, big-science 
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed wheel compiled w.
 ......deepspeed info  torch 1.8, cuda 11.1...................
 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 ...........deepspeed install path  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ...................0.4.2+72ce55a, 72ce55a, big-science 
0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
 > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688)
> initializing torch distributed ...
ninjaninjaninjaninja   .................................... ..................  .................. [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------


op nameop name op nameop name ................   ................................................installed  installed  installed installed .... ....    compatiblecompatiblecompatiblecompatible


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


cpu_adamcpu_adamcpu_adam cpu_adam ...............  ............... ..............................  [92m[YES][0m [92m[YES][0m[92m[YES][0m  [92m[YES][0m ..................    [92m[OKAY][0m[92m[OKAY][0m......[92m[OKAY][0m

 [92m[OKAY][0m

fused_adam fused_adam.............fused_adam   .............fused_adam[93m[NO][0m.............   [93m[NO][0m[93m[NO][0m .............   .....................  [93m[NO][0m  [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m.......


 [92m[OKAY][0m
fused_lamb fused_lambfused_lamb.............   fused_lamb..........................[93m[NO][0m    .............[93m[NO][0m[93m[NO][0m .......  ....... .......[93m[NO][0m [92m[OKAY][0m [92m[OKAY][0m 
[92m[OKAY][0m
.......
 [92m[OKAY][0m
sparse_attnsparse_attn  ........................  sparse_attn[93m[NO][0m[93m[NO][0m  ............ sparse_attn..............    [93m[NO][0m[92m[OKAY][0m............[92m[OKAY][0m 
 
.......[93m[NO][0mtransformer  transformer [92m[OKAY][0m....... 
............ ............ [92m[OKAY][0m transformer[93m[NO][0m
[93m[NO][0m  ............transformer  .......[93m[NO][0m.......    ...................[92m[OKAY][0m [92m[OKAY][0m 
[93m[NO][0m
[92m[OKAY][0m 
.......stochastic_transformerstochastic_transformer   [92m[OKAY][0m..
stochastic_transformer   [93m[NO][0m[93m[NO][0mstochastic_transformer .  ....... .......[93m[NO][0m .  .......[92m[OKAY][0m [92m[OKAY][0m [93m[NO][0m

[92m[OKAY][0m 
....... [92m[OKAY][0m
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch version torch version....................  ....................1.8.1 
1.8.1
torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 .....................nvcc version  11.2.....................
 deepspeed install path11.2 
...........deepspeed install path  ...........['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
...................deepspeed info  0.4.2+72ce55a, 72ce55a, big-science...................
 deepspeed wheel compiled w.0.4.2+72ce55a, 72ce55a, big-science 
...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 
...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  .......................................... 11.2 
11.2
deepspeed install path deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed info deepspeed info...................  ...................0.4.2+72ce55a, 72ce55a, big-science 
0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

> setting tensorboard ...
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path torch install path...............  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version torch version....................  ....................1.8.1 
1.8.1
torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
> initializing tensor model parallel with size 4
> initializing pipeline model parallel with size 4
> setting random seeds to 1234 ...
[2021-10-01 23:02:20,752] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
> compiling dataset index builder ...
make: Entering directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data'
make: Nothing to be done for 'default'.
make: Leaving directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data'
>>> done with dataset index builder. Compilation time: 0.331 seconds
> compiling and loading fused kernels ...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_upper_triang_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_upper_triang_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module fused_mix_prec_layer_norm_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_mix_prec_layer_norm_cuda...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
>>> done with compiling and loading fused kernels. Compilation time: 20.343 seconds
time to initialize megatron (seconds): -14.499
[after megatron is initialized] datetime: 2021-10-01 23:02:41 
building GPT model ...
[2021-10-01 23:02:41,587] [INFO] [utils.py:680:see_memory_usage] Before Building Model
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved
  warnings.warn(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved
  warnings.warn(
[2021-10-01 23:02:41,588] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB         Max_MA 0.0 GB         CA 0.0 GB         Max_CA 0 GB 
[2021-10-01 23:02:41,588] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory:  used = 39.11 GB, percent = 20.9%
SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None
Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=1, data=0, model=0): 16, ProcessCoord(pipe=1, data=0, model=1): 17, ProcessCoord(pipe=1, data=0, model=2): 18, ProcessCoord(pipe=1, data=0, model=3): 19, ProcessCoord(pipe=1, data=1, model=0): 20, ProcessCoord(pipe=1, data=1, model=1): 21, ProcessCoord(pipe=1, data=1, model=2): 22, ProcessCoord(pipe=1, data=1, model=3): 23, ProcessCoord(pipe=1, data=2, model=0): 24, ProcessCoord(pipe=1, data=2, model=1): 25, ProcessCoord(pipe=1, data=2, model=2): 26, ProcessCoord(pipe=1, data=2, model=3): 27, ProcessCoord(pipe=1, data=3, model=0): 28, ProcessCoord(pipe=1, data=3, model=1): 29, ProcessCoord(pipe=1, data=3, model=2): 30, ProcessCoord(pipe=1, data=3, model=3): 31, ProcessCoord(pipe=2, data=0, model=0): 32, ProcessCoord(pipe=2, data=0, model=1): 33, ProcessCoord(pipe=2, data=0, model=2): 34, ProcessCoord(pipe=2, data=0, model=3): 35, ProcessCoord(pipe=2, data=1, model=0): 36, ProcessCoord(pipe=2, data=1, model=1): 37, ProcessCoord(pipe=2, data=1, model=2): 38, ProcessCoord(pipe=2, data=1, model=3): 39, ProcessCoord(pipe=2, data=2, model=0): 40, ProcessCoord(pipe=2, data=2, model=1): 41, ProcessCoord(pipe=2, data=2, model=2): 42, ProcessCoord(pipe=2, data=2, model=3): 43, ProcessCoord(pipe=2, data=3, model=0): 44, ProcessCoord(pipe=2, data=3, model=1): 45, ProcessCoord(pipe=2, data=3, model=2): 46, ProcessCoord(pipe=2, data=3, model=3): 47, ProcessCoord(pipe=3, data=0, model=0): 48, ProcessCoord(pipe=3, data=0, model=1): 49, ProcessCoord(pipe=3, data=0, model=2): 50, ProcessCoord(pipe=3, data=0, model=3): 51, ProcessCoord(pipe=3, data=1, model=0): 52, ProcessCoord(pipe=3, data=1, model=1): 53, ProcessCoord(pipe=3, data=1, model=2): 54, ProcessCoord(pipe=3, data=1, model=3): 55, ProcessCoord(pipe=3, data=2, model=0): 56, ProcessCoord(pipe=3, data=2, model=1): 57, ProcessCoord(pipe=3, data=2, model=2): 58, ProcessCoord(pipe=3, data=2, model=3): 59, ProcessCoord(pipe=3, data=3, model=0): 60, ProcessCoord(pipe=3, data=3, model=1): 61, ProcessCoord(pipe=3, data=3, model=2): 62, ProcessCoord(pipe=3, data=3, model=3): 63}
[2021-10-01 23:02:42,111] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer
stage=0 layers=9
     0: _to_float16
     1: EmbeddingPipe
     2: <lambda>
     3: ParallelTransformerLayerPipe
     4: ParallelTransformerLayerPipe
     5: ParallelTransformerLayerPipe
     6: ParallelTransformerLayerPipe
     7: ParallelTransformerLayerPipe
     8: ParallelTransformerLayerPipe
stage=1 layers=6
     9: ParallelTransformerLayerPipe
    10: ParallelTransformerLayerPipe
    11: ParallelTransformerLayerPipe
    12: ParallelTransformerLayerPipe
    13: ParallelTransformerLayerPipe
    14: ParallelTransformerLayerPipe
stage=2 layers=6
    15: ParallelTransformerLayerPipe
    16: ParallelTransformerLayerPipe
    17: ParallelTransformerLayerPipe
    18: ParallelTransformerLayerPipe
    19: ParallelTransformerLayerPipe
    20: ParallelTransformerLayerPipe
stage=3 layers=10
    21: ParallelTransformerLayerPipe
    22: ParallelTransformerLayerPipe
    23: ParallelTransformerLayerPipe
    24: ParallelTransformerLayerPipe
    25: ParallelTransformerLayerPipe
    26: ParallelTransformerLayerPipe
    27: <lambda>
    28: MixedFusedLayerNorm
    29: EmbeddingPipe
    30: float16_to_fp32
  loss: CrossEntropy
 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 101549056

 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 101549056
 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 101549056
 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 101544960
 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 101544960
 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 101544960
[2021-10-01 23:02:42,489] [INFO] [utils.py:680:see_memory_usage] After Building Model
[2021-10-01 23:02:42,490] [INFO] [utils.py:681:see_memory_usage] MA 0.2 GB         Max_MA 0.22 GB         CA 0.24 GB         Max_CA 0 GB 
[2021-10-01 23:02:42,490] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory:  used = 39.5 GB, percent = 21.1%
 > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 101544960
setting training iterations to 152972
> learning rate decay style: cosine
DeepSpeed is enabled.
[2021-10-01 23:02:42,509] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+72ce55a, git-hash=72ce55a, git-branch=big-science
[2021-10-01 23:02:42,580] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False
[2021-10-01 23:02:42,581] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer
[2021-10-01 23:02:42,581] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer
[2021-10-01 23:02:42,581] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam
[2021-10-01 23:02:42,581] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type=<class 'apex.optimizers.fused_adam.FusedAdam'>
[2021-10-01 23:02:42,581] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer
[2021-10-01 23:02:42,581] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000
[2021-10-01 23:02:42,581] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000
[2021-10-01 23:02:42,581] [INFO] [stage2.py:108:__init__] CPU Offload: False
[2021-10-01 23:02:42,581] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False
[2021-10-01 23:02:42,815] [INFO] [stage2.py:419:__init__] optimizer state initialized
[2021-10-01 23:02:42,815] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam
[2021-10-01 23:02:42,815] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler
[2021-10-01 23:02:42,815] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = <megatron.learning_rates.AnnealingLR object at 0x15473eee0c70>
[2021-10-01 23:02:42,815] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)]
[2021-10-01 23:02:42,815] [INFO] [config.py:900:print] DeepSpeedEngine configuration:
[2021-10-01 23:02:42,816] [INFO] [config.py:904:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2021-10-01 23:02:42,816] [INFO] [config.py:904:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2021-10-01 23:02:42,816] [INFO] [config.py:904:print]   allreduce_always_fp32 ........ False
[2021-10-01 23:02:42,816] [INFO] [config.py:904:print]   amp_enabled .................. False
[2021-10-01 23:02:42,816] [INFO] [config.py:904:print]   amp_params ................... False
[2021-10-01 23:02:42,816] [INFO] [config.py:904:print]   checkpoint_tag_validation_enabled  True
[2021-10-01 23:02:42,816] [INFO] [config.py:904:print]   checkpoint_tag_validation_fail  False
[2021-10-01 23:02:42,816] [INFO] [config.py:904:print]   disable_allgather ............ False
[2021-10-01 23:02:42,816] [INFO] [config.py:904:print]   dump_state ................... False
[2021-10-01 23:02:42,816] [INFO] [config.py:904:print]   dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1}
[2021-10-01 23:02:42,816] [INFO] [config.py:904:print]   eigenvalue_enabled ........... False
[2021-10-01 23:02:42,816] [INFO] [config.py:904:print]   eigenvalue_gas_boundary_resolution  1
[2021-10-01 23:02:42,816] [INFO] [config.py:904:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2021-10-01 23:02:42,816] [INFO] [config.py:904:print]   eigenvalue_layer_num ......... 0
[2021-10-01 23:02:42,816] [INFO] [config.py:904:print]   eigenvalue_max_iter .......... 100
[2021-10-01 23:02:42,816] [INFO] [config.py:904:print]   eigenvalue_stability ......... 1e-06
[2021-10-01 23:02:42,816] [INFO] [config.py:904:print]   eigenvalue_tol ............... 0.01
[2021-10-01 23:02:42,816] [INFO] [config.py:904:print]   eigenvalue_verbose ........... False
[2021-10-01 23:02:42,816] [INFO] [config.py:904:print]   elasticity_enabled ........... False
[2021-10-01 23:02:42,816] [INFO] [config.py:904:print]   flops_profiler_config ........ {
    "enabled": false, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2021-10-01 23:02:42,816] [INFO] [config.py:904:print]   fp16_enabled ................. True
[2021-10-01 23:02:42,816] [INFO] [config.py:904:print]   fp16_mixed_quantize .......... False
[2021-10-01 23:02:42,816] [INFO] [config.py:904:print]   global_rank .................. 0
[2021-10-01 23:02:42,816] [INFO] [config.py:904:print]   gradient_accumulation_steps .. 16
[2021-10-01 23:02:42,816] [INFO] [config.py:904:print]   gradient_clipping ............ 1.0
[2021-10-01 23:02:42,816] [INFO] [config.py:904:print]   gradient_predivide_factor .... 1.0
[2021-10-01 23:02:42,816] [INFO] [config.py:904:print]   initial_dynamic_scale ........ 4096
[2021-10-01 23:02:42,816] [INFO] [config.py:904:print]   loss_scale ................... 0
[2021-10-01 23:02:42,816] [INFO] [config.py:904:print]   memory_breakdown ............. False
[2021-10-01 23:02:42,816] [INFO] [config.py:904:print]   optimizer_legacy_fusion ...... False
[2021-10-01 23:02:42,816] [INFO] [config.py:904:print]   optimizer_name ............... None
[2021-10-01 23:02:42,816] [INFO] [config.py:904:print]   optimizer_params ............. None
[2021-10-01 23:02:42,816] [INFO] [config.py:904:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2021-10-01 23:02:42,816] [INFO] [config.py:904:print]   pld_enabled .................. False
[2021-10-01 23:02:42,817] [INFO] [config.py:904:print]   pld_params ................... False
[2021-10-01 23:02:42,817] [INFO] [config.py:904:print]   prescale_gradients ........... False
[2021-10-01 23:02:42,817] [INFO] [config.py:904:print]   quantize_change_rate ......... 0.001
[2021-10-01 23:02:42,817] [INFO] [config.py:904:print]   quantize_groups .............. 1
[2021-10-01 23:02:42,817] [INFO] [config.py:904:print]   quantize_offset .............. 1000
[2021-10-01 23:02:42,817] [INFO] [config.py:904:print]   quantize_period .............. 1000
[2021-10-01 23:02:42,817] [INFO] [config.py:904:print]   quantize_rounding ............ 0
[2021-10-01 23:02:42,817] [INFO] [config.py:904:print]   quantize_start_bits .......... 16
[2021-10-01 23:02:42,817] [INFO] [config.py:904:print]   quantize_target_bits ......... 8
[2021-10-01 23:02:42,817] [INFO] [config.py:904:print]   quantize_training_enabled .... False
[2021-10-01 23:02:42,817] [INFO] [config.py:904:print]   quantize_type ................ 0
[2021-10-01 23:02:42,817] [INFO] [config.py:904:print]   quantize_verbose ............. False
[2021-10-01 23:02:42,817] [INFO] [config.py:904:print]   scheduler_name ............... None
[2021-10-01 23:02:42,817] [INFO] [config.py:904:print]   scheduler_params ............. None
[2021-10-01 23:02:42,817] [INFO] [config.py:904:print]   sparse_attention ............. None
[2021-10-01 23:02:42,817] [INFO] [config.py:904:print]   sparse_gradients_enabled ..... False
[2021-10-01 23:02:42,817] [INFO] [config.py:904:print]   steps_per_print .............. 2000
[2021-10-01 23:02:42,817] [INFO] [config.py:904:print]   tensorboard_enabled .......... False
[2021-10-01 23:02:42,817] [INFO] [config.py:904:print]   tensorboard_job_name ......... DeepSpeedJobName
[2021-10-01 23:02:42,817] [INFO] [config.py:904:print]   tensorboard_output_path ...... 
[2021-10-01 23:02:42,817] [INFO] [config.py:904:print]   train_batch_size ............. 512
[2021-10-01 23:02:42,817] [INFO] [config.py:904:print]   train_micro_batch_size_per_gpu  8
[2021-10-01 23:02:42,817] [INFO] [config.py:904:print]   use_quantizer_kernel ......... False
[2021-10-01 23:02:42,817] [INFO] [config.py:904:print]   wall_clock_breakdown ......... False
[2021-10-01 23:02:42,817] [INFO] [config.py:904:print]   world_size ................... 4
[2021-10-01 23:02:42,817] [INFO] [config.py:904:print]   zero_allow_untested_optimizer  False
[2021-10-01 23:02:42,817] [INFO] [config.py:904:print]   zero_config .................. {
    "stage": 1, 
    "contiguous_gradients": false, 
    "reduce_scatter": true, 
    "reduce_bucket_size": 5.000000e+08, 
    "allgather_partitions": true, 
    "allgather_bucket_size": 5.000000e+08, 
    "overlap_comm": false, 
    "load_from_fp32_weights": true, 
    "elastic_checkpoint": true, 
    "offload_param": null, 
    "offload_optimizer": null, 
    "sub_group_size": 1.000000e+09, 
    "prefetch_bucket_size": 5.000000e+07, 
    "param_persistence_threshold": 1.000000e+05, 
    "max_live_parameters": 1.000000e+09, 
    "max_reuse_distance": 1.000000e+09, 
    "gather_fp16_weights_on_model_save": false, 
    "ignore_unused_parameters": true, 
    "round_robin_gradients": false, 
    "legacy_stage1": false
}
[2021-10-01 23:02:42,817] [INFO] [config.py:904:print]   zero_enabled ................. True
[2021-10-01 23:02:42,817] [INFO] [config.py:904:print]   zero_optimization_stage ...... 1
[2021-10-01 23:02:42,817] [INFO] [config.py:906:print]   json = {
    "train_micro_batch_size_per_gpu": 8, 
    "train_batch_size": 512, 
    "gradient_clipping": 1.0, 
    "zero_optimization": {
        "stage": 1
    }, 
    "fp16": {
        "enabled": true, 
        "loss_scale": 0, 
        "loss_scale_window": 500, 
        "hysteresis": 2, 
        "min_loss_scale": 1, 
        "initial_scale_power": 12
    }, 
    "steps_per_print": 2.000000e+03, 
    "wall_clock_breakdown": false
}
[2021-10-01 23:02:42,818] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=16 micro_batch_size=8
[2021-10-01 23:02:43,109] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-01 23:02:43,109] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-01 23:02:43,109] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-01 23:02:43,109] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-01 23:02:43,109] [INFO] [engine.py:134:__init__] RANK=33 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-01 23:02:43,109] [INFO] [engine.py:134:__init__] RANK=35 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-01 23:02:43,109] [INFO] [engine.py:134:__init__] RANK=32 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-01 23:02:43,109] [INFO] [engine.py:134:__init__] RANK=34 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-01 23:02:43,109] [INFO] [engine.py:134:__init__] RANK=19 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-01 23:02:43,109] [INFO] [engine.py:134:__init__] RANK=17 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-01 23:02:43,109] [INFO] [engine.py:134:__init__] RANK=49 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-01 23:02:43,109] [INFO] [engine.py:134:__init__] RANK=50 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-01 23:02:43,109] [INFO] [engine.py:134:__init__] RANK=51 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-01 23:02:43,109] [INFO] [engine.py:134:__init__] RANK=18 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-01 23:02:43,109] [INFO] [engine.py:134:__init__] RANK=16 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-01 23:02:43,109] [INFO] [engine.py:134:__init__] RANK=48 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
 > using checkpoint value 0.0002 for learning rate
 > using checkpoint value 1e-05 for minimum learning rate
 > using checkpoint value 183105 for warmup iterations
 > using checkpoint value 73242187 for total number of iterations
 > using checkpoint value cosine for decay style
successfully loaded 4 ZeRO state_dicts for rank 34
successfully loaded 4 ZeRO state_dicts for rank 33
successfully loaded 4 ZeRO state_dicts for rank 36
successfully loaded 4 ZeRO state_dicts for rank 40
successfully loaded 4 ZeRO state_dicts for rank 35
successfully loaded 4 ZeRO state_dicts for rank 37
successfully loaded 4 ZeRO state_dicts for rank 44
successfully loaded 4 ZeRO state_dicts for rank 42
successfully loaded 4 ZeRO state_dicts for rank 38
successfully loaded 4 ZeRO state_dicts for rank 47
successfully loaded 4 ZeRO state_dicts for rank 43
successfully loaded 4 ZeRO state_dicts for rank 39
successfully loaded 4 ZeRO state_dicts for rank 45
successfully loaded 4 ZeRO state_dicts for rank 41
successfully loaded 4 ZeRO state_dicts for rank 32
successfully loaded 4 ZeRO state_dicts for rank 25
successfully loaded 4 ZeRO state_dicts for rank 24
successfully loaded 4 ZeRO state_dicts for rank 46
successfully loaded 4 ZeRO state_dicts for rank 18
successfully loaded 4 ZeRO state_dicts for rank 19
successfully loaded 4 ZeRO state_dicts for rank 17
successfully loaded 4 ZeRO state_dicts for rank 29
successfully loaded 4 ZeRO state_dicts for rank 27
successfully loaded 4 ZeRO state_dicts for rank 16
successfully loaded 4 ZeRO state_dicts for rank 26
successfully loaded 4 ZeRO state_dicts for rank 0
successfully loaded 4 ZeRO state_dicts for rank 22
successfully loaded 4 ZeRO state_dicts for rank 21
successfully loaded 4 ZeRO state_dicts for rank 28
successfully loaded 4 ZeRO state_dicts for rank 20
successfully loaded 4 ZeRO state_dicts for rank 30
successfully loaded 4 ZeRO state_dicts for rank 5
successfully loaded 4 ZeRO state_dicts for rank 23
successfully loaded 4 ZeRO state_dicts for rank 31
successfully loaded 4 ZeRO state_dicts for rank 56
successfully loaded 4 ZeRO state_dicts for rank 2
successfully loaded 4 ZeRO state_dicts for rank 6
successfully loaded 4 ZeRO state_dicts for rank 48
loading 4 zero partition checkpoints for rank 36
successfully loaded 4 ZeRO state_dicts for rank 53
successfully loaded 4 ZeRO state_dicts for rank 13
successfully loaded 4 ZeRO state_dicts for rank 14
loading 4 zero partition checkpoints for rank 34
loading 4 zero partition checkpoints for rank 33
successfully loaded 4 ZeRO state_dicts for rank 15
successfully loaded 4 ZeRO state_dicts for rank 7
successfully loaded 4 ZeRO state_dicts for rank 54
loading 4 zero partition checkpoints for rank 40
successfully loaded 4 ZeRO state_dicts for rank 4
loading 4 zero partition checkpoints for rank 35
successfully loaded 4 ZeRO state_dicts for rank 51
loading 4 zero partition checkpoints for rank 38
loading 4 zero partition checkpoints for rank 37
successfully loaded 4 ZeRO state_dicts for rank 49
successfully loaded 4 ZeRO state_dicts for rank 50
successfully loaded 4 ZeRO state_dicts for rank 55
loading 4 zero partition checkpoints for rank 42
successfully loaded 4 ZeRO state_dicts for rank 8
loading 4 zero partition checkpoints for rank 44
loading 4 zero partition checkpoints for rank 41
successfully loaded 4 ZeRO state_dicts for rank 52
successfully loaded 4 ZeRO state_dicts for rank 1
successfully loaded 4 ZeRO state_dicts for rank 58
loading 4 zero partition checkpoints for rank 43
loading 4 zero partition checkpoints for rank 39
loading 4 zero partition checkpoints for rank 32
loading 4 zero partition checkpoints for rank 47
successfully loaded 4 ZeRO state_dicts for rank 57
successfully loaded 4 ZeRO state_dicts for rank 9
loading 4 zero partition checkpoints for rank 45
successfully loaded 4 ZeRO state_dicts for rank 12
successfully loaded 4 ZeRO state_dicts for rank 11
successfully loaded 4 ZeRO state_dicts for rank 59
successfully loaded 4 ZeRO state_dicts for rank 10
loading 4 zero partition checkpoints for rank 46
successfully loaded 4 ZeRO state_dicts for rank 3
successfully loaded 4 ZeRO state_dicts for rank 60
successfully loaded 4 ZeRO state_dicts for rank 61
successfully loaded 4 ZeRO state_dicts for rank 62
loading 4 zero partition checkpoints for rank 16
loading 4 zero partition checkpoints for rank 25
loading 4 zero partition checkpoints for rank 24
loading 4 zero partition checkpoints for rank 17
loading 4 zero partition checkpoints for rank 18
loading 4 zero partition checkpoints for rank 19
loading 4 zero partition checkpoints for rank 27
loading 4 zero partition checkpoints for rank 29
successfully loaded 4 ZeRO state_dicts for rank 63
loading 4 zero partition checkpoints for rank 26
loading 4 zero partition checkpoints for rank 28
loading 4 zero partition checkpoints for rank 30
loading 4 zero partition checkpoints for rank 22loading 4 zero partition checkpoints for rank 21loading 4 zero partition checkpoints for rank 20


loading 4 zero partition checkpoints for rank 23
loading 4 zero partition checkpoints for rank 31
loading 4 zero partition checkpoints for rank 0
 checkpoint version 3.0
loading 4 zero partition checkpoints for rank 5
loading 4 zero partition checkpoints for rank 2
loading 4 zero partition checkpoints for rank 56
loading 4 zero partition checkpoints for rank 48
loading 4 zero partition checkpoints for rank 6
loading 4 zero partition checkpoints for rank 4
loading 4 zero partition checkpoints for rank 13loading 4 zero partition checkpoints for rank 14

loading 4 zero partition checkpoints for rank 53
loading 4 zero partition checkpoints for rank 52
loading 4 zero partition checkpoints for rank 15
loading 4 zero partition checkpoints for rank 7
loading 4 zero partition checkpoints for rank 54
loading 4 zero partition checkpoints for rank 51
loading 4 zero partition checkpoints for rank 49
loading 4 zero partition checkpoints for rank 55
loading 4 zero partition checkpoints for rank 50
loading 4 zero partition checkpoints for rank 58
loading 4 zero partition checkpoints for rank 1
loading 4 zero partition checkpoints for rank 8
loading 4 zero partition checkpoints for rank 12
loading 4 zero partition checkpoints for rank 57
loading 4 zero partition checkpoints for rank 9
loading 4 zero partition checkpoints for rank 59
loading 4 zero partition checkpoints for rank 11
loading 4 zero partition checkpoints for rank 3
loading 4 zero partition checkpoints for rank 10
loading 4 zero partition checkpoints for rank 60
loading 4 zero partition checkpoints for rank 61
loading 4 zero partition checkpoints for rank 63
loading 4 zero partition checkpoints for rank 62
  successfully loaded checkpoint from /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints at iteration 66367
time (ms) | load-checkpoint: 2020.71
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.624784896
estimated model parameters: 1.209483264estimated model parameters: 1.209483264

estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.62471936
estimated model parameters: 1.62471936
estimated model parameters: 1.62471936
estimated model parameters: 1.62471936
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264estimated model parameters: 1.209483264

estimated model parameters: 1.62471936
estimated model parameters: 1.62471936
estimated model parameters: 1.62471936
estimated model parameters: 1.209483264estimated model parameters: 1.209483264

estimated model parameters: 1.62471936estimated model parameters: 1.62471936

estimated model parameters: 1.209483264
estimated model parameters: 1.62471936
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.62471936
estimated model parameters: 1.209483264estimated model parameters: 1.209483264

estimated model parameters: 1.62471936
estimated model parameters: 1.209483264estimated model parameters: 1.209483264

estimated model parameters: 1.209483264
estimated model parameters: 1.209483264estimated model parameters: 1.209483264

estimated model parameters: 1.209483264
estimated model parameters: 1.62471936estimated model parameters: 1.62471936

estimated model parameters: 1.209483264estimated model parameters: 1.209483264

estimated model parameters: 1.62471936
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264estimated model parameters: 1.209483264

estimated model parameters: 1.62471936
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.624784896
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.624784896
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.624784896
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488

estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896estimated model parameters: 1.624784896

estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488


estimated model parameters without embeddings: 1.2095488
[after model, optimizer, and learning rate scheduler are built] datetime: 2021-10-01 23:02:45 
> building train, validation, and test datasets ...
 > datasets target sizes (minimum size):
    train:      73242187
    validation: 7833600
    test:       51200
> building train, validation, and test datasets for GPT ...
 > building dataset index ...
    reading sizes...
    reading pointers...
    reading document index...
    creating numpy buffer of mmap...
    creating memory view of numpy buffer...
 > finished creating indexed dataset in 0.114604 seconds
    number of documents: 304230423
 > dataset split:
    train:
     document indices in [0, 288714672) total of 288714672 documents
    validation:
     document indices in [288714672, 303926193) total of 15211521 documents
    test:
     document indices in [303926193, 304230423) total of 304230 documents
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.153 seconds
    total number of samples: 131537224
    total number of epochs: 1
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.299 seconds
    total number of samples: 13854322
    total number of epochs: 2
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.059 seconds
    total number of samples: 137384
    total number of epochs: 1
> finished creating GPT datasets ...
time (ms) | model-and-optimizer-setup: 3730.59 | train/valid/test-data-iterators-setup: 5416.22
Number of parameters: 1.62471936 billionNumber of parameters: 1.62471936 billion

Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion

Number of parameters: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion

Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion


Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion

Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billion

Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion

Number of parameters: 1.209483264 billion
Number of parameters: 1.62471936 billionNumber of parameters: 1.62471936 billionNumber of parameters: 1.62471936 billion


Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion


Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billion

Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.624784896 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.62471936 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion

Number of parameters: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion


Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion

Number of parameters without embeddings: 1.209483264 billionNumber of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion

Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion


Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion

Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
[after dataloaders are built] datetime: 2021-10-01 23:02:51 
done with setup ...
training ...
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
[before the start of training step] datetime: 2021-10-01 23:02:51 
[2021-10-01 23:02:51,548] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information
[2021-10-01 23:02:51,548] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False
[2021-10-01 23:02:51,548] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 24 total layers
[2021-10-01 23:02:51,548] [INFO] [checkpointing.py:415:forward] ----Synchronization False
[2021-10-01 23:02:51,548] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False
[Rank 51] (after 66400 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7310.0 | max reserved: 7310.0[Rank 48] (after 66400 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7310.0 | max reserved: 7310.0

 iteration    66400/  152972 | consumed samples:     28917184 | elapsed time per iteration (ms): 6959.6 | learning rate: 1.365E-04 | global batch size:   512 | lm loss: 2.860609E+00 | loss scale: 524288.0 | grad norm: 37771.913 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
[Rank 49] (after 66400 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 6742.0 | max reserved: 6742.0
[Rank 2] (after 66400 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.45751953125 | reserved: 5494.0 | max reserved: 5494.0
[Rank 50] (after 66400 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 6506.0 | max reserved: 6506.0
[Rank 34] (after 66400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4364.0 | max reserved: 4364.0
[Rank 18] (after 66400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4764.0 | max reserved: 4764.0
[Rank 19] (after 66400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4780.0 | max reserved: 4780.0
[Rank 35] (after 66400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4364.0 | max reserved: 4364.0
[Rank 3] (after 66400 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.95751953125 | reserved: 5510.0 | max reserved: 5510.0
[Rank 16] (after 66400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4764.0 | max reserved: 4764.0
[Rank 32] (after 66400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4364.0 | max reserved: 4364.0
[Rank 0] (after 66400 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.45751953125 | reserved: 5318.0 | max reserved: 5318.0
[Rank 17] (after 66400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4780.0 | max reserved: 4780.0
[Rank 33] (after 66400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4364.0 | max reserved: 4364.0
[Rank 1] (after 66400 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.45751953125 | reserved: 5510.0 | max reserved: 5510.0
time (ms)
 iteration    66600/  152972 | consumed samples:     29019584 | elapsed time per iteration (ms): 6664.2 | learning rate: 1.361E-04 | global batch size:   512 | lm loss: 2.848031E+00 | loss scale: 524288.0 | grad norm: 41574.184 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    66800/  152972 | consumed samples:     29121984 | elapsed time per iteration (ms): 6637.2 | learning rate: 1.357E-04 | global batch size:   512 | lm loss: 2.848347E+00 | loss scale: 1048576.0 | grad norm: 91936.393 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    67000/  152972 | consumed samples:     29224384 | elapsed time per iteration (ms): 6697.9 | learning rate: 1.353E-04 | global batch size:   512 | lm loss: 2.847244E+00 | loss scale: 1048576.0 | grad norm: 100583.699 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 67000 | lm loss value: 2.799806E+00 | lm loss PPL: 1.644145E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    67200/  152972 | consumed samples:     29326784 | elapsed time per iteration (ms): 7629.8 | learning rate: 1.349E-04 | global batch size:   512 | lm loss: 2.848878E+00 | loss scale: 1048576.0 | grad norm: 93321.316 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    67400/  152972 | consumed samples:     29429184 | elapsed time per iteration (ms): 6743.6 | learning rate: 1.345E-04 | global batch size:   512 | lm loss: 2.851081E+00 | loss scale: 1048576.0 | grad norm: 99034.307 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration   67500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-02 01:12:22,728] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step67500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   67500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1466.90
 iteration    67600/  152972 | consumed samples:     29531584 | elapsed time per iteration (ms): 6701.2 | learning rate: 1.341E-04 | global batch size:   512 | lm loss: 2.851879E+00 | loss scale: 524288.0 | grad norm: 47312.969 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    67800/  152972 | consumed samples:     29633984 | elapsed time per iteration (ms): 6736.1 | learning rate: 1.337E-04 | global batch size:   512 | lm loss: 2.853810E+00 | loss scale: 524288.0 | grad norm: 48730.811 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-10-02 02:08:37,295] [INFO] [logging.py:68:log_dist] [Rank 0] step=68000, skipped=142, lr=[0.0001333212789759598, 0.0001333212789759598], mom=[(0.9, 0.999), (0.9, 0.999)]
 iteration    68000/  152972 | consumed samples:     29736384 | elapsed time per iteration (ms): 6770.9 | learning rate: 1.333E-04 | global batch size:   512 | lm loss: 2.857578E+00 | loss scale: 1048576.0 | grad norm: 97677.067 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
steps: 68000 loss: 2.8591 iter time (s): 0.004 samples/sec: 144225.746
-------------------------------------------------------------------------------------------------
 validation loss at iteration 68000 | lm loss value: 2.807224E+00 | lm loss PPL: 1.656388E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    68200/  152972 | consumed samples:     29838784 | elapsed time per iteration (ms): 7674.0 | learning rate: 1.329E-04 | global batch size:   512 | lm loss: 2.857237E+00 | loss scale: 1048576.0 | grad norm: 95968.849 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    68400/  152972 | consumed samples:     29941184 | elapsed time per iteration (ms): 6752.7 | learning rate: 1.325E-04 | global batch size:   512 | lm loss: 2.857251E+00 | loss scale: 1048576.0 | grad norm: 114154.761 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    68600/  152972 | consumed samples:     30043584 | elapsed time per iteration (ms): 6748.5 | learning rate: 1.321E-04 | global batch size:   512 | lm loss: 2.856095E+00 | loss scale: 1048576.0 | grad norm: 100702.734 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    68800/  152972 | consumed samples:     30145984 | elapsed time per iteration (ms): 6748.0 | learning rate: 1.317E-04 | global batch size:   512 | lm loss: 2.856600E+00 | loss scale: 1048576.0 | grad norm: 101288.505 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    69000/  152972 | consumed samples:     30248384 | elapsed time per iteration (ms): 6737.0 | learning rate: 1.313E-04 | global batch size:   512 | lm loss: 2.857905E+00 | loss scale: 1048576.0 | grad norm: 106892.893 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 69000 | lm loss value: 2.805853E+00 | lm loss PPL: 1.654118E+01 | 
-------------------------------------------------------------------------------------------------
saving checkpoint at iteration   69000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-02 04:07:16,992] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step69000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   69000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1640.95
 iteration    69200/  152972 | consumed samples:     30350784 | elapsed time per iteration (ms): 7696.5 | learning rate: 1.309E-04 | global batch size:   512 | lm loss: 2.856655E+00 | loss scale: 2097152.0 | grad norm: 203035.664 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    69400/  152972 | consumed samples:     30453184 | elapsed time per iteration (ms): 6722.7 | learning rate: 1.305E-04 | global batch size:   512 | lm loss: 2.855619E+00 | loss scale: 1048576.0 | grad norm: 93227.326 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    69600/  152972 | consumed samples:     30555584 | elapsed time per iteration (ms): 6734.7 | learning rate: 1.301E-04 | global batch size:   512 | lm loss: 2.856692E+00 | loss scale: 1048576.0 | grad norm: 92333.531 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    69800/  152972 | consumed samples:     30657984 | elapsed time per iteration (ms): 6761.9 | learning rate: 1.297E-04 | global batch size:   512 | lm loss: 2.856476E+00 | loss scale: 524288.0 | grad norm: 48283.480 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-10-02 05:59:39,230] [INFO] [logging.py:68:log_dist] [Rank 0] step=70000, skipped=147, lr=[0.00012931232904314985, 0.00012931232904314985], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 70000 loss: 2.8355 iter time (s): 0.004 samples/sec: 143993.520
 iteration    70000/  152972 | consumed samples:     30760384 | elapsed time per iteration (ms): 6733.7 | learning rate: 1.293E-04 | global batch size:   512 | lm loss: 2.855641E+00 | loss scale: 524288.0 | grad norm: 50354.677 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 70000 | lm loss value: 2.804905E+00 | lm loss PPL: 1.652550E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    70200/  152972 | consumed samples:     30862784 | elapsed time per iteration (ms): 7686.9 | learning rate: 1.289E-04 | global batch size:   512 | lm loss: 2.854361E+00 | loss scale: 1048576.0 | grad norm: 108387.278 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    70400/  152972 | consumed samples:     30965184 | elapsed time per iteration (ms): 6787.8 | learning rate: 1.285E-04 | global batch size:   512 | lm loss: 2.857964E+00 | loss scale: 1048576.0 | grad norm: 104045.215 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration   70500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-02 06:59:09,488] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step70500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   70500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1653.16
 iteration    70600/  152972 | consumed samples:     31067584 | elapsed time per iteration (ms): 6778.7 | learning rate: 1.281E-04 | global batch size:   512 | lm loss: 2.855918E+00 | loss scale: 1048576.0 | grad norm: 96615.864 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    70800/  152972 | consumed samples:     31169984 | elapsed time per iteration (ms): 6768.3 | learning rate: 1.277E-04 | global batch size:   512 | lm loss: 2.855535E+00 | loss scale: 2097152.0 | grad norm: 201277.086 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    71000/  152972 | consumed samples:     31272384 | elapsed time per iteration (ms): 6773.2 | learning rate: 1.273E-04 | global batch size:   512 | lm loss: 2.855888E+00 | loss scale: 1048576.0 | grad norm: 100478.416 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 71000 | lm loss value: 2.805329E+00 | lm loss PPL: 1.653251E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    71200/  152972 | consumed samples:     31374784 | elapsed time per iteration (ms): 7726.4 | learning rate: 1.269E-04 | global batch size:   512 | lm loss: 2.850772E+00 | loss scale: 1048576.0 | grad norm: 104840.504 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    71400/  152972 | consumed samples:     31477184 | elapsed time per iteration (ms): 6764.4 | learning rate: 1.265E-04 | global batch size:   512 | lm loss: 2.853851E+00 | loss scale: 1048576.0 | grad norm: 105358.481 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    71600/  152972 | consumed samples:     31579584 | elapsed time per iteration (ms): 6740.2 | learning rate: 1.261E-04 | global batch size:   512 | lm loss: 2.854133E+00 | loss scale: 524288.0 | grad norm: 56300.050 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    71800/  152972 | consumed samples:     31681984 | elapsed time per iteration (ms): 6827.1 | learning rate: 1.257E-04 | global batch size:   512 | lm loss: 2.849879E+00 | loss scale: 524288.0 | grad norm: 54489.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-10-02 09:51:42,826] [INFO] [logging.py:68:log_dist] [Rank 0] step=72000, skipped=153, lr=[0.00012525852677763017, 0.00012525852677763017], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 72000 loss: 2.8303 iter time (s): 0.004 samples/sec: 139103.238
 iteration    72000/  152972 | consumed samples:     31784384 | elapsed time per iteration (ms): 6765.0 | learning rate: 1.253E-04 | global batch size:   512 | lm loss: 2.854839E+00 | loss scale: 1048576.0 | grad norm: 95336.550 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 72000 | lm loss value: 2.803121E+00 | lm loss PPL: 1.649605E+01 | 
-------------------------------------------------------------------------------------------------
saving checkpoint at iteration   72000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-02 09:54:49,084] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step72000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   72000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1585.54
 iteration    72200/  152972 | consumed samples:     31886784 | elapsed time per iteration (ms): 7701.7 | learning rate: 1.249E-04 | global batch size:   512 | lm loss: 2.854711E+00 | loss scale: 524288.0 | grad norm: 50711.640 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    72400/  152972 | consumed samples:     31989184 | elapsed time per iteration (ms): 6758.1 | learning rate: 1.244E-04 | global batch size:   512 | lm loss: 2.853582E+00 | loss scale: 262144.0 | grad norm: 25183.540 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    72600/  152972 | consumed samples:     32091584 | elapsed time per iteration (ms): 6757.0 | learning rate: 1.240E-04 | global batch size:   512 | lm loss: 2.854164E+00 | loss scale: 262144.0 | grad norm: 26579.997 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    72800/  152972 | consumed samples:     32193984 | elapsed time per iteration (ms): 6738.1 | learning rate: 1.236E-04 | global batch size:   512 | lm loss: 2.852678E+00 | loss scale: 524288.0 | grad norm: 50591.605 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    73000/  152972 | consumed samples:     32296384 | elapsed time per iteration (ms): 6761.0 | learning rate: 1.232E-04 | global batch size:   512 | lm loss: 2.851805E+00 | loss scale: 524288.0 | grad norm: 50305.296 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 73000 | lm loss value: 2.800665E+00 | lm loss PPL: 1.645559E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    73200/  152972 | consumed samples:     32398784 | elapsed time per iteration (ms): 7715.2 | learning rate: 1.228E-04 | global batch size:   512 | lm loss: 2.850456E+00 | loss scale: 262144.0 | grad norm: 25417.346 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    73400/  152972 | consumed samples:     32501184 | elapsed time per iteration (ms): 6737.4 | learning rate: 1.224E-04 | global batch size:   512 | lm loss: 2.852412E+00 | loss scale: 131072.0 | grad norm: 16154.417 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration   73500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-02 12:46:56,877] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step73500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   73500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1537.64
 iteration    73600/  152972 | consumed samples:     32603584 | elapsed time per iteration (ms): 6805.1 | learning rate: 1.220E-04 | global batch size:   512 | lm loss: 2.856136E+00 | loss scale: 131072.0 | grad norm: 13358.534 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    73800/  152972 | consumed samples:     32705984 | elapsed time per iteration (ms): 6748.5 | learning rate: 1.216E-04 | global batch size:   512 | lm loss: 2.847681E+00 | loss scale: 131072.0 | grad norm: 11654.360 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-10-02 13:43:21,396] [INFO] [logging.py:68:log_dist] [Rank 0] step=74000, skipped=158, lr=[0.00012116362616754137, 0.00012116362616754137], mom=[(0.9, 0.999), (0.9, 0.999)]
 iteration    74000/  152972 | consumed samples:     32808384 | elapsed time per iteration (ms): 6770.6 | learning rate: 1.212E-04 | global batch size:   512 | lm loss: 2.852518E+00 | loss scale: 262144.0 | grad norm: 25712.167 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
steps: 74000 loss: 2.8839 iter time (s): 0.003 samples/sec: 157092.181
-------------------------------------------------------------------------------------------------
 validation loss at iteration 74000 | lm loss value: 2.793049E+00 | lm loss PPL: 1.633074E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    74200/  152972 | consumed samples:     32910784 | elapsed time per iteration (ms): 7722.2 | learning rate: 1.208E-04 | global batch size:   512 | lm loss: 2.850579E+00 | loss scale: 262144.0 | grad norm: 25754.685 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    74400/  152972 | consumed samples:     33013184 | elapsed time per iteration (ms): 6794.2 | learning rate: 1.203E-04 | global batch size:   512 | lm loss: 2.850829E+00 | loss scale: 524288.0 | grad norm: 52143.582 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    74600/  152972 | consumed samples:     33115584 | elapsed time per iteration (ms): 6776.9 | learning rate: 1.199E-04 | global batch size:   512 | lm loss: 2.846215E+00 | loss scale: 524288.0 | grad norm: 53580.405 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    74800/  152972 | consumed samples:     33217984 | elapsed time per iteration (ms): 6796.4 | learning rate: 1.195E-04 | global batch size:   512 | lm loss: 2.845712E+00 | loss scale: 524288.0 | grad norm: 52668.726 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    75000/  152972 | consumed samples:     33320384 | elapsed time per iteration (ms): 6791.5 | learning rate: 1.191E-04 | global batch size:   512 | lm loss: 2.846152E+00 | loss scale: 1048576.0 | grad norm: 110561.030 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 75000 | lm loss value: 2.795223E+00 | lm loss PPL: 1.636627E+01 | 
-------------------------------------------------------------------------------------------------
saving checkpoint at iteration   75000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-02 15:42:38,338] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step75000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   75000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1703.03
 iteration    75200/  152972 | consumed samples:     33422784 | elapsed time per iteration (ms): 7684.7 | learning rate: 1.187E-04 | global batch size:   512 | lm loss: 2.852895E+00 | loss scale: 524288.0 | grad norm: 55204.334 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    75400/  152972 | consumed samples:     33525184 | elapsed time per iteration (ms): 6753.5 | learning rate: 1.183E-04 | global batch size:   512 | lm loss: 2.844674E+00 | loss scale: 524288.0 | grad norm: 49166.981 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    75600/  152972 | consumed samples:     33627584 | elapsed time per iteration (ms): 6782.4 | learning rate: 1.179E-04 | global batch size:   512 | lm loss: 2.847534E+00 | loss scale: 262144.0 | grad norm: 27896.990 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    75800/  152972 | consumed samples:     33729984 | elapsed time per iteration (ms): 6779.8 | learning rate: 1.175E-04 | global batch size:   512 | lm loss: 2.845177E+00 | loss scale: 262144.0 | grad norm: 25938.193 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-10-02 17:35:29,446] [INFO] [logging.py:68:log_dist] [Rank 0] step=76000, skipped=163, lr=[0.00011703754771760277, 0.00011703754771760277], mom=[(0.9, 0.999), (0.9, 0.999)]
 iteration    76000/  152972 | consumed samples:     33832384 | elapsed time per iteration (ms): 6758.6 | learning rate: 1.170E-04 | global batch size:   512 | lm loss: 2.846029E+00 | loss scale: 262144.0 | grad norm: 25893.255 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
steps: 76000 loss: 2.8728 iter time (s): 0.003 samples/sec: 153587.986
-------------------------------------------------------------------------------------------------
 validation loss at iteration 76000 | lm loss value: 2.792851E+00 | lm loss PPL: 1.632750E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    76200/  152972 | consumed samples:     33934784 | elapsed time per iteration (ms): 7714.3 | learning rate: 1.166E-04 | global batch size:   512 | lm loss: 2.845838E+00 | loss scale: 524288.0 | grad norm: 51856.705 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    76400/  152972 | consumed samples:     34037184 | elapsed time per iteration (ms): 6786.1 | learning rate: 1.162E-04 | global batch size:   512 | lm loss: 2.841127E+00 | loss scale: 524288.0 | grad norm: 49069.698 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration   76500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-02 18:35:09,272] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step76500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   76500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1505.09
 iteration    76600/  152972 | consumed samples:     34139584 | elapsed time per iteration (ms): 6820.9 | learning rate: 1.158E-04 | global batch size:   512 | lm loss: 2.843770E+00 | loss scale: 524288.0 | grad norm: 51478.556 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration   76657 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-02 18:53:00,278] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step76657/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   76657 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1533.25
[exiting program after 1190.0646815776824 minutes] datetime: 2021-10-02 18:53:01 
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
2021-10-02 18:53:18.470587: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:18.470589: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:18.470675: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:18.470676: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:18.605212: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:18.605219: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:18.605245: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:18.605253: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:18.722709: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:18.722706: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:18.722709: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:18.722706: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.279618: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.279623: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.283005: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.283015: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.329117: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.329109: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.329110: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.329122: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.488224: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.488227: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.488225: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.488236: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.498313: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.498327: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.498320: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.498337: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.557882: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.557876: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.557886: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.557896: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.570678: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.570675: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.570670: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.570679: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.574162: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.574157: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.574169: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.574178: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.577377: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.577376: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.577382: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.577388: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.577462: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.577455: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.577457: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.577459: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.594331: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.594338: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.594345: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.594355: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.642824: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.642826: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.642843: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.642836: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.729512: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.729517: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.729512: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:19.729521: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:20.124089: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:20.156539: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:20.592208: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-02 18:53:20.592391: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report--------------------------------------------------

----------------------------------------------------------------------------------------------------
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------JIT compiled ops requires ninja
--------------------------------------------------

--------------------------------------------------
JIT compiled ops requires ninja

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

ninjaninja  ....................................  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

op nameop name  ................................  installedinstalled  ....  compatiblecompatible

----------------------------------------------------------------------------------------------------

ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
--------------------------------------------------

cpu_adam ...............cpu_adam  [92m[YES][0m...............  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
--------------------------------------------------op nameop nameop name
   ................op name................ ................  installed................installed    installed..installed..    ..compatiblecompatible..
 
 --------------------------------------------------compatiblecompatible--------------------------------------------------


----------------------------------------------------------------------------------------------------

cpu_adamcpu_adam  ...............cpu_adamcpu_adam...............   [92m[YES][0m ...............[92m[YES][0m ...............  ...... [92m[YES][0m...... [92m[YES][0m [92m[OKAY][0m  ......
[92m[OKAY][0m...... 
 [92m[OKAY][0m[92m[OKAY][0m

fused_adam .............fused_adam  [93m[NO][0m.............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m .......fused_lamb  [92m[OKAY][0m.............
 [93m[NO][0m ....... [92m[OKAY][0m
fused_adam fused_adam............. fused_adam .............fused_adam[93m[NO][0m    .................................[93m[NO][0m  [92m[OKAY][0m  [93m[NO][0m
sparse_attn ............ [93m[NO][0m .......sparse_attn  [92m[OKAY][0m............
 [93m[NO][0m transformer.......  [92m[OKAY][0m............
[93m[NO][0m.......   ..............[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m

 [93m[NO][0m .......transformer  [92m[OKAY][0m............
 [93m[NO][0m ....... stochastic_transformer[92m[OKAY][0m 
fused_lamb fused_lambfused_lambfused_lamb.............    .......................................[93m[NO][0m   [93m[NO][0m[93m[NO][0m .......  [93m[NO][0m .............. [92m[OKAY][0m  .......
[92m[OKAY][0m[92m[OKAY][0m 

. [93m[NO][0m .......stochastic_transformer  [92m[OKAY][0m.
 [93m[NO][0m ....... [92m[OKAY][0m
[92m[OKAY][0m
sparse_attn ............ sparse_attn[93m[NO][0msparse_attnsparse_attn    ...........................................    [92m[OKAY][0m[93m[NO][0m[93m[NO][0m[93m[NO][0m
   .....................transformer    [92m[OKAY][0m............[92m[OKAY][0m[92m[OKAY][0m

 
[93m[NO][0m transformer.......  transformer............[92m[OKAY][0mtransformer  
 ............[93m[NO][0m............  stochastic_transformer .......[93m[NO][0m  [93m[NO][0m . [92m[OKAY][0m....... 
....... [93m[NO][0m [92m[OKAY][0m [92m[OKAY][0m.......

stochastic_transformer  [92m[OKAY][0m
.stochastic_transformerstochastic_transformer  [93m[NO][0m . . ....... [93m[NO][0m [93m[NO][0m [92m[OKAY][0m .......
.......  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report
JIT compiled ops requires ninja


--------------------------------------------------DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
------------------------------------------------------------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja


JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report


--------------------------------------------------DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
JIT compiled ops requires ninja
--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
ninjaninja ninjaninja..................    [92m[OKAY][0m......................................................
   --------------------------------------------------[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------op name
---------------------------------------------------------------------------------------------------- 
op name................
  op name................installedop name    ................installed..................    ..installedcompatibleinstalled  
compatible ..
..-------------------------------------------------- compatible--------------------------------------------------
 

compatible--------------------------------------------------

--------------------------------------------------
cpu_adam ...............cpu_adam  cpu_adam[92m[YES][0m...............cpu_adam    ......[92m[YES][0m............... ............... [92m[OKAY][0m  ......[92m[YES][0m
[92m[YES][0m   [92m[OKAY][0m......
......  [92m[OKAY][0m[92m[OKAY][0m

fused_adam ............. [93m[NO][0m fused_adam.......  .............[92m[OKAY][0mfused_adam 
fused_adam [93m[NO][0m.............  fused_lamb ............. [93m[NO][0m....... .............  .......[93m[NO][0m[92m[OKAY][0m   [93m[NO][0m
[92m[OKAY][0m....... 
 .......fused_lamb [92m[OKAY][0mfused_lamb 
[92m[OKAY][0m .............
............. fused_lamb [93m[NO][0m [93m[NO][0m ............. ....... ....... [93m[NO][0m[92m[OKAY][0m 
 .......[92m[OKAY][0msparse_attn 
 [92m[OKAY][0m............
 [93m[NO][0m ....... [92m[OKAY][0m
sparse_attntransformer  ........................sparse_attn sparse_attn   [93m[NO][0m[93m[NO][0m........................    ..............[93m[NO][0m[93m[NO][0m   [92m[OKAY][0m[92m[OKAY][0m 
..............
  stochastic_transformertransformer[92m[OKAY][0m[92m[OKAY][0m 
 
............. transformer [93m[NO][0mtransformer[93m[NO][0m    ......................................   [93m[NO][0m [92m[OKAY][0m[93m[NO][0m
[92m[OKAY][0m  
.............. stochastic_transformer [92m[OKAY][0m 
[92m[OKAY][0m.
 [93m[NO][0m .......stochastic_transformerstochastic_transformer   .[92m[OKAY][0m.
  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

ninjaninjaninjaninja   ....................................  .................. [92m[OKAY][0m.................. [92m[OKAY][0m
 [92m[OKAY][0m
[92m[OKAY][0m--------------------------------------------------


----------------------------------------------------------------------------------------------------
--------------------------------------------------op name

op name op name................ op name  ................ ................installed ................  installed installed.. installed   ......compatible  
compatiblecompatible
--------------------------------------------------
 
--------------------------------------------------compatible--------------------------------------------------


--------------------------------------------------
cpu_adamcpu_adam  cpu_adam..............................   ...............[92m[YES][0m[92m[YES][0m   cpu_adam[92m[YES][0m............    ...............[92m[OKAY][0m......[92m[OKAY][0m 

[92m[OKAY][0m
 [92m[YES][0m ...... [92m[OKAY][0mfused_adam fused_adamfused_adam
.............   ..........................[93m[NO][0m   [93m[NO][0m[93m[NO][0m.......   ..............[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0mfused_adam

 fused_lamb ............. fused_lambfused_lamb[93m[NO][0m ............. .............  ....................[93m[NO][0m    [93m[NO][0m[93m[NO][0m[92m[OKAY][0m.......  
 ..............[92m[OKAY][0m 
 [92m[OKAY][0m[92m[OKAY][0m

fused_lamb .............sparse_attn  [93m[NO][0m ............sparse_attnsparse_attn....... [93m[NO][0m    ...............................  [93m[NO][0m [93m[NO][0m[92m[OKAY][0m [92m[OKAY][0m 
.......
.......  [92m[OKAY][0m[92m[OKAY][0m

transformer transformer............transformer   ............[93m[NO][0m............   [93m[NO][0m.......[93m[NO][0m   .......[92m[OKAY][0m....... 
 sparse_attn[92m[OKAY][0m[92m[OKAY][0m 
stochastic_transformer
stochastic_transformer............  stochastic_transformer .. [93m[NO][0m  . [93m[NO][0m [93m[NO][0m[93m[NO][0m.......   .....................   [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


op name
op nameop name op name   ................................................................   installed installedinstalledinstalled    ...... ..  compatiblecompatible 
compatible

compatible------------------------------------------------------------------------------------------------------------------------------------------------------


--------------------------------------------------
cpu_adamcpu_adam  cpu_adamcpu_adam............... ...............  ............... ...............[92m[YES][0m[92m[YES][0m   [92m[YES][0m ......[92m[YES][0m ......  ......[92m[OKAY][0m ......  
[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


fused_adam fused_adamfused_adam.............fused_adam    .............[93m[NO][0m............. ............. .......   [93m[NO][0m[92m[OKAY][0m[93m[NO][0m[93m[NO][0m  
 .............. ....... [92m[OKAY][0m fused_lamb[92m[OKAY][0m
[92m[OKAY][0m 
.............
 fused_lamb fused_lamb[93m[NO][0mfused_lamb.............    .................................[93m[NO][0m   [93m[NO][0m [93m[NO][0m .......[92m[OKAY][0m  .......
.......[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m

sparse_attn sparse_attn............ sparse_attn ............sparse_attn[93m[NO][0m   ........................[93m[NO][0m    .......[93m[NO][0m....... [93m[NO][0m  [92m[OKAY][0m[92m[OKAY][0m .......

....... [92m[OKAY][0m 
[92m[OKAY][0mtransformertransformer 
 ............transformer............ transformer[93m[NO][0m    [93m[NO][0m................... ............ .......  [92m[OKAY][0m [93m[NO][0m
[93m[NO][0m[92m[OKAY][0m 
 ..............stochastic_transformer   [92m[OKAY][0m[92m[OKAY][0mstochastic_transformer.
  
.[93m[NO][0m  [93m[NO][0mstochastic_transformer .......stochastic_transformer.......    [92m[OKAY][0m.[92m[OKAY][0m 
.
[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report--------------------------------------------------

--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report


--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


----------------------------------------------------------------------------------------------------
--------------------------------------------------
--------------------------------------------------
op nameop name
op name   ................op name................................    installed................installedinstalled    ....installed..    ..compatiblecompatiblecompatible

 
----------------------------------------------------------------------------------------------------compatible--------------------------------------------------


--------------------------------------------------

cpu_adam ............... cpu_adamcpu_adam[92m[YES][0m  ...............cpu_adam ...............  ..................... [92m[YES][0m   [92m[YES][0m[92m[OKAY][0m[92m[YES][0m...... 
  ......[92m[OKAY][0m...... 
 [92m[OKAY][0m[92m[OKAY][0m

fused_adam ............. fused_adam[93m[NO][0mfused_adam  fused_adam  ................................. ............. [93m[NO][0m[93m[NO][0m   [92m[OKAY][0m 
[93m[NO][0m.............. fused_lamb  ....... [92m[OKAY][0m.............[92m[OKAY][0m

  [92m[OKAY][0m[93m[NO][0m
fused_lamb fused_lamb ....... ..........................fused_lamb   [92m[OKAY][0m [93m[NO][0m[93m[NO][0m
.............   .......[93m[NO][0m.......  [92m[OKAY][0m .......
[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn transformer............sparse_attn sparse_attn ............ [93m[NO][0m  [93m[NO][0m............ ............  ....... ....... [93m[NO][0m[93m[NO][0m   .......[92m[OKAY][0m[92m[OKAY][0m.......
 
 [92m[OKAY][0m[92m[OKAY][0m
stochastic_transformertransformer
  .transformer............ transformer[93m[NO][0m    ............[93m[NO][0m...................  [93m[NO][0m  .......[93m[NO][0m [92m[OKAY][0m .......[92m[OKAY][0m 
 
.......[92m[OKAY][0m 
[92m[OKAY][0mstochastic_transformer 
.stochastic_transformer  [93m[NO][0mstochastic_transformer . ........   [93m[NO][0m[92m[OKAY][0m[93m[NO][0m 
 ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------


--------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


------------------------------------------------------------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


------------------------------------------------------------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninja
JIT compiled ops requires ninja
JIT compiled ops requires ninja

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninjaninja   .................. ......................................................   [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------


op nameop nameop name op name ................ ................  installed................ ................installed    ..installed installed..   ..compatible..compatible 

 compatible--------------------------------------------------
compatible----------------------------------------------------------------------------------------------------


--------------------------------------------------
cpu_adamcpu_adam  cpu_adam.............................. cpu_adam ...............  [92m[YES][0m ...............[92m[YES][0m [92m[YES][0m   ..................[92m[YES][0m    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
...... 

[92m[OKAY][0m
fused_adamfused_adam fused_adam fused_adam............. .............  ..........................   [93m[NO][0m[93m[NO][0m [93m[NO][0m [93m[NO][0m....... .......  ....... ....... [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
 
[92m[OKAY][0m
fused_lamb
fused_lamb  fused_lamb.............  fused_lamb.............  ..........................[93m[NO][0m [93m[NO][0m [93m[NO][0m.......    .......[93m[NO][0m....... [92m[OKAY][0m  [92m[OKAY][0m[92m[OKAY][0m
.......

 [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............sparse_attn sparse_attntransformer [93m[NO][0m ...................    ........................[93m[NO][0m [92m[OKAY][0m 
 [93m[NO][0m[93m[NO][0m ....... transformer ....... ....... ............[92m[OKAY][0m  
[93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m 

.......transformertransformer   [92m[OKAY][0mstochastic_transformer............
 ............ . [93m[NO][0mstochastic_transformer [93m[NO][0m .......  [93m[NO][0m. .......  [92m[OKAY][0m .......[93m[NO][0m
 [92m[OKAY][0m stochastic_transformer[92m[OKAY][0m.......

  .[92m[OKAY][0mstochastic_transformer 
 [93m[NO][0m ........  [92m[OKAY][0m[93m[NO][0m 
....... [92m[OKAY][0m
ninjaninjaninjaninja  .................. .................. .................. .................. [92m[OKAY][0m  [92m[OKAY][0m
[92m[OKAY][0m
[92m[OKAY][0m--------------------------------------------------

--------------------------------------------------

--------------------------------------------------op name--------------------------------------------------
 op name
................op name ................   op nameinstalled................installed    ....................installed    installedcompatible.. 
compatible ..--------------------------------------------------
compatible 
--------------------------------------------------
compatible

--------------------------------------------------
--------------------------------------------------
----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report

--------------------------------------------------DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

cpu_adam ............... cpu_adam[92m[YES][0m  ...............cpu_adam...... cpu_adam [92m[YES][0m  [92m[OKAY][0m............... 
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
.....................   [92m[YES][0m[92m[OKAY][0m[92m[YES][0m 
 ......fused_adam......   [92m[OKAY][0m.............[92m[OKAY][0m 

[93m[NO][0m fused_adam.......  .............[92m[OKAY][0m 
[93m[NO][0m ....... fused_lambfused_adamfused_adam [92m[OKAY][0m  ..........................
 ............. [93m[NO][0m fused_lamb[93m[NO][0m [93m[NO][0m  ....... ............. ....... [92m[OKAY][0m.......[93m[NO][0m 
  [92m[OKAY][0m.......[92m[OKAY][0m 

[92m[OKAY][0m
fused_lambfused_lamb  ..........................  [93m[NO][0m[93m[NO][0m sparse_attn ....... ....... ............sparse_attn  [92m[OKAY][0m[92m[OKAY][0m 
[93m[NO][0m
............ .......  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
transformer ............transformer  sparse_attn[93m[NO][0m............sparse_attn    ................... ............[93m[NO][0m [92m[OKAY][0m  
[93m[NO][0m[93m[NO][0m.......   stochastic_transformer.......[92m[OKAY][0m ....... .
[92m[OKAY][0m  
[93m[NO][0m[92m[OKAY][0mstochastic_transformer 
transformer.......   .transformer  [92m[OKAY][0m............[93m[NO][0m............ 
  .......[93m[NO][0m[93m[NO][0m   [92m[OKAY][0m..............
  [92m[OKAY][0m[92m[OKAY][0m

stochastic_transformerstochastic_transformer  ..  [93m[NO][0m[93m[NO][0m  .............. [92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
----------------------------------------------------------------------------------------------------


JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja


--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0masync_io .......  ...............[93m[NO][0m 
[93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformer_inference .. transformer_inference[93m[NO][0m  .........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
utils .................. [92m[YES][0m utils......  ..................[92m[OKAY][0m 
[92m[YES][0m ...... quantizer[92m[OKAY][0m 
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
.............. [93m[NO][0m .......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m ....... [92m[OKAY][0m--------------------------------------------------

--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninja  ....................................  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

op name op name................  ................installed  installed..  ..compatible 
compatible--------------------------------------------------

--------------------------------------------------
cpu_adam ............... cpu_adam[92m[YES][0m  .....................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
fused_adam .............fused_adam  [93m[NO][0m.............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
fused_lamb ............. fused_lamb[93m[NO][0m  ....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
sparse_attn ............sparse_attn  [93m[NO][0m............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
transformer transformer............  ............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
stochastic_transformer stochastic_transformer.  .[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
ninjaninjaninjaninja    ...................................................... .................. [92m[OKAY][0m [92m[OKAY][0m 
[92m[OKAY][0m
[92m[OKAY][0m--------------------------------------------------
--------------------------------------------------


--------------------------------------------------op name
op name--------------------------------------------------  op name
................................   ................op nameinstalledinstalled    installed....................   .. compatible compatible
installed
compatible----------------------------------------------------------------------------------------------------

 
--------------------------------------------------..
 compatible
cpu_adam-------------------------------------------------- cpu_adam...............cpu_adam
   ...............[92m[YES][0m ............... [92m[YES][0m......   ......[92m[YES][0m[92m[OKAY][0m 
 [92m[OKAY][0mcpu_adam......
  [92m[OKAY][0m...............
 [92m[YES][0mfused_adam  .............fused_adam......   fused_adam[93m[NO][0m.............   [92m[OKAY][0m.......[93m[NO][0m.............  [92m[OKAY][0m 

[93m[NO][0m.......  fused_lamb [92m[OKAY][0m.......
.............  [92m[OKAY][0m[93m[NO][0mfused_lamb
fused_adam  .................... fused_lamb   [93m[NO][0m.............[92m[OKAY][0m.............  
 .......[93m[NO][0m[93m[NO][0m  [92m[OKAY][0m 
..............  [92m[OKAY][0msparse_attn[92m[OKAY][0m
 
............ [93m[NO][0m ....... [92m[OKAY][0msparse_attn
 ............fused_lamb transformer[93m[NO][0m  sparse_attn ......................... .......   ............[93m[NO][0m[92m[OKAY][0m [93m[NO][0m 
[93m[NO][0m.......   transformer.......[92m[OKAY][0m....... 
  ............[92m[OKAY][0m[92m[OKAY][0m stochastic_transformer
[93m[NO][0m
  .transformer.......  [93m[NO][0m [92m[OKAY][0m ............
.......  [93m[NO][0m[92m[OKAY][0mstochastic_transformer .......
  .sparse_attn[92m[OKAY][0m 
 [93m[NO][0m ................... stochastic_transformer [92m[OKAY][0m 
[93m[NO][0m.  [93m[NO][0m.......  ....... [92m[OKAY][0m[92m[OKAY][0m

transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op report--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja--------------------------------------------------
--------------------------------------------------

DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
ninjaninja ninja..................   ....................................[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------

--------------------------------------------------op nameop name
  ................op name................  installed ................ installedninja..    ..installed .................. compatible ..compatible[92m[OKAY][0m
 

--------------------------------------------------compatible

----------------------------------------------------------------------------------------------------

--------------------------------------------------
op name cpu_adam................ cpu_adam  cpu_adam...............installed  [92m[YES][0m............... ...............  .. ......[92m[YES][0m  [92m[YES][0m [92m[OKAY][0mcompatible 
...... [92m[OKAY][0m......

 [92m[OKAY][0m--------------------------------------------------

fused_adam ............. [93m[NO][0mfused_adam  .................... cpu_adamfused_adam[92m[OKAY][0m   ...............[93m[NO][0m.............
  [92m[YES][0m .......fused_lamb[93m[NO][0m    ..........................[92m[OKAY][0m   [93m[NO][0m
 [92m[OKAY][0m[92m[OKAY][0m.......

 fused_lamb[92m[OKAY][0m 
fused_lamb.............  .............fused_adam [93m[NO][0m [93m[NO][0m  ...........................   sparse_attn[92m[OKAY][0m [93m[NO][0m[92m[OKAY][0m
............ 
 .......[93m[NO][0m  .......[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attntransformerfused_lamb  ............ ............sparse_attn.............    [93m[NO][0m[93m[NO][0m[93m[NO][0m............    .....................[93m[NO][0m    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m

.......
 stochastic_transformertransformer[92m[OKAY][0m 
 . ............transformer[93m[NO][0m   ...................[93m[NO][0m   [93m[NO][0m[92m[OKAY][0msparse_attn
.......   [92m[OKAY][0m...................
  [92m[OKAY][0m[93m[NO][0m .......stochastic_transformer
  .[92m[OKAY][0m 
[93m[NO][0mstochastic_transformer .......transformer   ............[92m[OKAY][0m. 
 [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report--------------------------------------------------

--------------------------------------------------
----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report


DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


------------------------------------------------------------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja
JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja

JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... async_io[93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
transformer_inference .. transformer_inference[93m[NO][0m  .........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
utils .................. [92m[YES][0mutils  ........................  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
--------------------------------------------------

op name--------------------------------------------------op nameop name  
 ................................................  op name installedinstalled installed  ................ ....   ..compatibleinstalledcompatible 
 
compatible----------------------------------------------------------------------------------------------------..


 --------------------------------------------------compatible

--------------------------------------------------
cpu_adamcpu_adam cpu_adam ............... ............... ...............cpu_adam [92m[YES][0m  [92m[YES][0m .....................[92m[YES][0m    ...... [92m[YES][0m[92m[OKAY][0m ......[92m[OKAY][0m
...... 
 [92m[OKAY][0m[92m[OKAY][0m

fused_adam fused_adam.............  .............fused_adam[93m[NO][0m fused_adam  [93m[NO][0m  ................................. .......  [92m[OKAY][0m [93m[NO][0m[92m[OKAY][0m[93m[NO][0m
  
.............. fused_lamb fused_lamb[92m[OKAY][0m [92m[OKAY][0m .............

.............  fused_lamb[93m[NO][0m[93m[NO][0mfused_lamb    ........................................  [92m[OKAY][0m  [93m[NO][0m
[93m[NO][0m[92m[OKAY][0m 
 ..............  [92m[OKAY][0m[92m[OKAY][0m

sparse_attn sparse_attn............  ............[93m[NO][0m sparse_attn sparse_attn[93m[NO][0m.......   ............ .......[92m[OKAY][0m
 ............ [93m[NO][0m [92m[OKAY][0mtransformer  [93m[NO][0m
...................   .......[93m[NO][0m[92m[OKAY][0mtransformer   
...................[92m[OKAY][0m  
[93m[NO][0mtransformer[92m[OKAY][0m transformer 
....... ............  ............stochastic_transformer[92m[OKAY][0m[93m[NO][0m  [93m[NO][0m.
   ..............[93m[NO][0m   stochastic_transformer[92m[OKAY][0m.......  [92m[OKAY][0m
.[92m[OKAY][0m

 stochastic_transformer[93m[NO][0m  ........stochastic_transformer   [92m[OKAY][0m[93m[NO][0m.
  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
------------------------------------------------------------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report--------------------------------------------------


----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------

JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja


JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io ...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utils .................. utils[92m[YES][0m  ........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
ninjaninjaninja   ......................................................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------


op nameninja op nameop name ................  ..................................  ................installed[92m[OKAY][0m   
installed..installed--------------------------------------------------   
....compatibleop name  
 compatible--------------------------------------------------compatible
................

 --------------------------------------------------installed--------------------------------------------------
 
..cpu_adam  compatible...............
 --------------------------------------------------[92m[YES][0m
 cpu_adam......cpu_adam   [92m[OKAY][0m...............
...............  [92m[YES][0mcpu_adam[92m[YES][0m   ...........................   [92m[YES][0m[92m[OKAY][0mfused_adam[92m[OKAY][0m 
 ......
.............  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
fused_adam fused_adamfused_lamb.............   ..........................[93m[NO][0mfused_adam    [93m[NO][0m [93m[NO][0m...........................   ....... [92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m 

[92m[OKAY][0m 
fused_lamb.......  [92m[OKAY][0mfused_lamb.............
  .............[93m[NO][0m  fused_lamb[93m[NO][0m.......  sparse_attn ............. ...................[92m[OKAY][0m 
  [93m[NO][0m[92m[OKAY][0m [93m[NO][0m....... 
 .......[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attn transformer............  ............ [93m[NO][0msparse_attn[93m[NO][0m   sparse_attn.............. ............   [92m[OKAY][0m............[92m[OKAY][0m
 [93m[NO][0m
[93m[NO][0m  .......transformer.......stochastic_transformer    [92m[OKAY][0m.[92m[OKAY][0m............
 
 [93m[NO][0mtransformer[93m[NO][0m transformer .......  ........................ ....... [92m[OKAY][0m  [93m[NO][0m
[93m[NO][0m [92m[OKAY][0m .......
....... [92m[OKAY][0m 
[92m[OKAY][0mstochastic_transformer
 stochastic_transformer.  stochastic_transformer.[93m[NO][0m   .[93m[NO][0m.......   [93m[NO][0m.......[92m[OKAY][0m  
.......[92m[OKAY][0m 
[92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m transformer_inference.......  ..[93m[NO][0m 
[93m[NO][0m ....... [92m[OKAY][0m
utils transformer_inference..................  ..[92m[YES][0m  [93m[NO][0m......  .......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer .............. utils[93m[NO][0m  .........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference ..transformer_inference  [93m[NO][0m..  [93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m[NO][0m.......  .......[92m[OKAY][0m
 
[92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

async_ioquantizerquantizer   .............................async_io..............   [93m[NO][0m[93m[NO][0m   [93m[NO][0m.............................  [93m[NO][0m
  .......[93m[NO][0m[92m[OKAY][0m  
.......[92m[OKAY][0m 
[93m[NO][0m
--------------------------------------------------transformer_inference--------------------------------------------------
 
.. [93m[NO][0mtransformer_inference  .........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0mutils
 .................. [92m[YES][0m ......utils  [92m[OKAY][0m..................
 [92m[YES][0m ......quantizer .............. [93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
quantizer --------------------------------------------------..............
 [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m[NO][0m

async_io ............... [93m[NO][0m ....... transformer_inference[93m[NO][0m 
async_io..  [93m[NO][0m...............  ....... transformer_inference[93m[NO][0m[92m[OKAY][0m 
 async_io.........   [93m[NO][0m...............utils[93m[NO][0m   
.........................[93m[NO][0m   [92m[YES][0m[92m[OKAY][0m....... 
......  [93m[NO][0m[92m[OKAY][0mtransformer_inference

 utils ..quantizer..................  ..............[93m[NO][0m  [93m[NO][0m [92m[YES][0mtransformer_inference .......   [92m[OKAY][0m..
 [93m[NO][0m ....................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
utils

 .................. utils[92m[YES][0mquantizer--------------------------------------------------   
......................................   [92m[YES][0m[92m[OKAY][0m[93m[NO][0m 
 ............. quantizer [92m[OKAY][0m[92m[OKAY][0m 

.............. quantizer --------------------------------------------------[93m[NO][0m..............
  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ...............async_io  [93m[NO][0m...............  ....... [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
transformer_inferencetransformer_inference  .... [93m[NO][0m ....... [92m[OKAY][0m
 [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m utils......  ..................[92m[OKAY][0m 
[92m[YES][0m ...... quantizer[92m[OKAY][0m .............. [93m[NO][0m .......
 [92m[OKAY][0mquantizer
 .............. --------------------------------------------------
[93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... DeepSpeed general environment info:1.8.1

torch cuda version ............... 11.1
torch install pathnvcc version  ....................................  11.2
deepspeed install path ........... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed infotorch version  .......................................  0.4.2+72ce55a, 72ce55a, big-science1.8.1

deepspeed wheel compiled w. torch cuda version......  ...............torch 1.8, cuda 11.1 
11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1DeepSpeed general environment info:
torch cuda version
 ............... 11.1
nvcc versiontorch install path  .................................... 11.2
deepspeed install path  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']................... 
0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w.torch version  ...... torch 1.8, cuda 11.1DeepSpeed general environment info:
DeepSpeed general environment info:.................... 1.8.1


torch cuda version ...............torch install pathtorch install path   ...............11.1 
...............nvcc version  ..................... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
11.2['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch versiondeepspeed install path
  ...............................torch version  1.8.1....................
  1.8.1['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']torch cuda version

 ...............torch cuda version  11.1deepspeed info...............
  ...................nvcc version11.1  
0.4.2+72ce55a, 72ce55a, big-science.....................nvcc version
  deepspeed wheel compiled w......................11.2  ......
11.2deepspeed install path 
 torch 1.8, cuda 11.1deepspeed install path...........
  ...........['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info
 ...................deepspeed info  0.4.2+72ce55a, 72ce55a, big-science...................
 deepspeed wheel compiled w.0.4.2+72ce55a, 72ce55a, big-science 
......deepspeed wheel compiled w.  torch 1.8, cuda 11.1......
 torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ...............async_io  [93m[NO][0m...............  .......[93m[NO][0m  [93m[NO][0m.......
 [93m[NO][0m
transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch version torch install path.................... 1.8.1 
............... torch cuda version ............... 11.1
nvcc version['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 
..................... 11.2torch version
 deepspeed install path....................  ...........1.8.1 
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']torch cuda version
 deepspeed info...............  ...................11.1 
0.4.2+72ce55a, 72ce55a, big-sciencenvcc version
 deepspeed wheel compiled w......................  ......11.2 
torch 1.8, cuda 11.1deepspeed install path
 ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m .......async_io [93m[NO][0m 
............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m transformer_inference....... .. [93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utils utils .................. [92m[YES][0m ........................  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0mquantizer  ..................... [92m[OKAY][0m
 [93m[NO][0m --------------------------------------------------.......
 [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path DeepSpeed general environment info:............... 
torch install path['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
 ............... torch version .................... 1.8.1['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch cuda version torch version...............  ....................11.1 
1.8.1nvcc version
 .....................torch cuda version  11.2...............
 deepspeed install path11.1 
...........nvcc version  .....................['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
11.2deepspeed info
 deepspeed install path...................  ...........0.4.2+72ce55a, 72ce55a, big-science 
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed wheel compiled w. 
......deepspeed info  torch 1.8, cuda 11.1...................
 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... utils[92m[OKAY][0m 
.................. [92m[YES][0m ...... [92m[OKAY][0mutils
 .................. [92m[YES][0m quantizer......  ..............[92m[OKAY][0m 
[93m[NO][0m .......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m ....... --------------------------------------------------[92m[OKAY][0m

--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
DeepSpeed general environment info:
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninja
JIT compiled ops requires ninja
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninjaninja   .................. .................................... ..................  [92m[OKAY][0m 
[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
--------------------------------------------------


------------------------------------------------------------------------------------------------------------------------------------------------------


op nameop name  op name................................op name   installed installed ..................................   .. installedcompatible installed
compatible  
--------------------------------------------------..--------------------------------------------------
.. 
 compatiblecompatible

----------------------------------------------------------------------------------------------------

cpu_adamcpu_adam  ..............................  cpu_adam[92m[YES][0m[92m[YES][0m cpu_adam  ............... ............  ............... [92m[YES][0m [92m[OKAY][0m[92m[OKAY][0m 

[92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
fused_adamfused_adam  ..........................  [93m[NO][0m[93m[NO][0mfused_adam  fused_adam..............    ..........................[92m[OKAY][0m[92m[OKAY][0m 
 [93m[NO][0m
[93m[NO][0m fused_lamb ....... fused_lamb ....................  [92m[OKAY][0m[93m[NO][0m 
 ....................[92m[OKAY][0m  
[92m[OKAY][0m[93m[NO][0mfused_lamb
  .......fused_lamb ............. [92m[OKAY][0m .............
[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0msparse_attn  
[92m[OKAY][0m............
 [93m[NO][0msparse_attn  ...................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0mtransformer
 sparse_attntransformer............sparse_attn    ........................[93m[NO][0m............    [93m[NO][0m[93m[NO][0m.......[93m[NO][0m   [92m[OKAY][0m ..............
 ....... [92m[OKAY][0m stochastic_transformer[92m[OKAY][0m 
[92m[OKAY][0m.
 
[93m[NO][0mstochastic_transformertransformer transformer.......    .........................[92m[OKAY][0m   
[93m[NO][0m[93m[NO][0m[93m[NO][0m  .......  ..............[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m

stochastic_transformer stochastic_transformer.  .[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  .............. [93m[NO][0m 
[93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer quantizer..............  ..............[93m[NO][0m  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0masync_io .......  ...............[93m[NO][0m 
[93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0mtransformer_inference  .........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']torch version
 .................... 1.8.1
torch cuda version ...............torch version  11.1....................
 1.8.1nvcc version
 ..................... torch cuda version11.2 
...............deepspeed install path  11.1........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

nvcc versiondeepspeed info  ........................................  11.20.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. 
......deepspeed install path  torch 1.8, cuda 11.1...........
 ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path ...............torch install path  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version torch version....................  ....................1.8.1 
1.8.1
torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2
deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ...................0.4.2+72ce55a, 72ce55a, big-science 
0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:DeepSpeed general environment info:
/bin/sh: line 0: type: git: not found

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ............... 11.1
nvcc version .................................... 11.2 
11.1deepspeed install path
 nvcc version...........  ..................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science11.2

deepspeed wheel compiled w.deepspeed install path  .................  torch 1.8, cuda 11.1
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
torch install pathtorch version  ...................................  1.8.1
torch cuda version ...............['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 11.1

nvcc versiontorch version  .........................................  11.21.8.1

deepspeed install pathtorch cuda version  ..........................  11.1['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

nvcc versiondeepspeed info  ........................................  11.20.4.2+72ce55a, 72ce55a, big-science

deepspeed install pathdeepspeed wheel compiled w.  .................  torch 1.8, cuda 11.1['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install pathDeepSpeed general environment info: ............... 
torch install path['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 
............... torch version .................... 1.8.1
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch cuda version ...............torch version  11.1....................
 nvcc version1.8.1 
..................... torch cuda version11.2 
...............deepspeed install path  11.1...........
 nvcc version ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'].....................
 deepspeed info11.2 
...................deepspeed install path  0.4.2+72ce55a, 72ce55a, big-science...........
 deepspeed wheel compiled w. ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']......
 deepspeed infotorch 1.8, cuda 11.1 
................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install pathDeepSpeed general environment info: ............... 
torch install path ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']...............
 torch version .................... 1.8.1['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch cuda version torch version...............  ....................11.1 
1.8.1nvcc version
 .....................torch cuda version  11.2...............
 deepspeed install path11.1 
...........nvcc version  .....................['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
11.2deepspeed info
 deepspeed install path...................  0.4.2+72ce55a, 72ce55a, big-science...........
 deepspeed wheel compiled w.['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
...... deepspeed infotorch 1.8, cuda 11.1 
................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ...............async_io [93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
transformer_inference .. transformer_inference[93m[NO][0m  .........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
utils .................. utils[92m[YES][0m  ........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
quantizer .............. [93m[NO][0mquantizer  .....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ...........DeepSpeed general environment info: ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-sciencetorch install path
 ...............deepspeed wheel compiled w.  ...... torch 1.8, cuda 11.1
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io...............  [93m[NO][0m...............  .......[93m[NO][0m  [93m[NO][0m.......
 [93m[NO][0m
transformer_inference .. transformer_inference[93m[NO][0m  .........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc version nvcc version.....................  .....................11.2 
11.2deepspeed install path
 ...........deepspeed install path  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ...................0.4.2+72ce55a, 72ce55a, big-science 
0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w.
 ......deepspeed wheel compiled w.  torch 1.8, cuda 11.1......
 torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch version .................... torch version1.8.1 
.................... torch cuda version1.8.1 
............... 11.1torch cuda version
 nvcc version...............  .....................11.1 
11.2nvcc version
 deepspeed install path.....................  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']11.2

deepspeed infodeepspeed install path  ..............................  0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ......['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
torch 1.8, cuda 11.1
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path torch install path...............  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja


--------------------------------------------------DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja


----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

ninjaninjaninjaninja   .................. .................................... ..................  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


op nameop name op nameop name ................  ................ ................ ................installed installed   installedinstalled....    ....compatiblecompatible  

compatiblecompatible--------------------------------------------------
--------------------------------------------------


----------------------------------------------------------------------------------------------------

cpu_adam ...............cpu_adamcpu_adam  cpu_adam[92m[YES][0m ...............   ....................................[92m[YES][0m    [92m[YES][0m......[92m[YES][0m[92m[OKAY][0m  ...... [92m[OKAY][0m......
 
 [92m[OKAY][0m[92m[OKAY][0m

fused_adam fused_adam.............fused_adamfused_adam    .............[93m[NO][0m..........................   [93m[NO][0m....... [93m[NO][0m[93m[NO][0m    .....................[92m[OKAY][0m 
 [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

fused_lambfused_lamb fused_lamb .............fused_lamb  .......................... [93m[NO][0m .............   [93m[NO][0m[93m[NO][0m.......[93m[NO][0m    .......[92m[OKAY][0m....... 
....... [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

sparse_attn ............sparse_attn sparse_attnsparse_attn [93m[NO][0m ............  ............ ...................[93m[NO][0m   [93m[NO][0m[93m[NO][0m [92m[OKAY][0m  
.....................   [92m[OKAY][0m[92m[OKAY][0mtransformer[92m[OKAY][0m
 

............ [93m[NO][0mtransformertransformer transformer .......  ........................ ............  [92m[OKAY][0m[93m[NO][0m [93m[NO][0m 
[93m[NO][0m ..............   [92m[OKAY][0m.......[92m[OKAY][0m
stochastic_transformer 
 [92m[OKAY][0m.stochastic_transformer
  stochastic_transformer[93m[NO][0m  .stochastic_transformer....... .   [93m[NO][0m[92m[OKAY][0m.[93m[NO][0m 
  ..............[93m[NO][0m   [92m[OKAY][0m[92m[OKAY][0m.......

 [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']torch version
 .................... 1.8.1torch version
 ....................torch cuda version  1.8.1...............
 11.1
torch cuda versionnvcc version  ....................................  11.111.2

nvcc versiondeepspeed install path  ................................  11.2
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed install path deepspeed info...........  ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']0.4.2+72ce55a, 72ce55a, big-science

deepspeed infodeepspeed wheel compiled w.  .........................  0.4.2+72ce55a, 72ce55a, big-sciencetorch 1.8, cuda 11.1

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
using world size: 64, data-parallel-size: 4, tensor-model-parallel size: 4, pipeline-model-parallel size: 4 
using torch.float16 for parameters ...
------------------------ arguments ------------------------
  accumulate_allreduce_grads_in_fp32 .............. False
  adam_beta1 ...................................... 0.9
  adam_beta2 ...................................... 0.999
  adam_eps ........................................ 1e-08
  adlr_autoresume ................................. False
  adlr_autoresume_interval ........................ 1000
  apply_query_key_layer_scaling ................... True
  apply_residual_connection_post_layernorm ........ False
  attention_dropout ............................... 0.1
  attention_softmax_in_fp32 ....................... False
  bert_binary_head ................................ True
  bert_load ....................................... None
  bf16 ............................................ False
  bias_dropout_fusion ............................. True
  bias_gelu_fusion ................................ True
  biencoder_projection_dim ........................ 0
  biencoder_shared_query_context_model ............ False
  block_data_path ................................. None
  checkpoint_activations .......................... True
  checkpoint_in_cpu ............................... False
  checkpoint_num_layers ........................... 1
  clip_grad ....................................... 1.0
  codecarbon_dir .................................. None
  consumed_train_samples .......................... 0
  consumed_valid_samples .......................... 0
  contigious_checkpointing ........................ False
  cpu_optimizer ................................... False
  cpu_torch_adam .................................. False
  data_impl ....................................... mmap
  data_parallel_size .............................. 4
  data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document']
  dataloader_type ................................. single
  DDP_impl ........................................ local
  decoder_seq_length .............................. None
  deepscale ....................................... False
  deepscale_config ................................ None
  deepspeed ....................................... True
  deepspeed_activation_checkpointing .............. True
  deepspeed_config ................................ ./ds_config.1353965.json
  deepspeed_mpi ................................... False
  distribute_checkpointed_activations ............. False
  distributed_backend ............................. nccl
  embedding_path .................................. None
  encoder_seq_length .............................. 2048
  eod_mask_loss ................................... False
  eval_interval ................................... 1000
  eval_iters ...................................... 100
  evidence_data_path .............................. None
  exit_duration_in_mins ........................... 1190
  exit_interval ................................... None
  ffn_hidden_size ................................. 8192
  finetune ........................................ False
  fp16 ............................................ True
  fp16_lm_cross_entropy ........................... False
  fp32_residual_connection ........................ False
  gigaflos_no_embeds .............................. 0
  global_batch_size ............................... 512
  glu_activation .................................. None
  hidden_dropout .................................. 0.1
  hidden_size ..................................... 2048
  hysteresis ...................................... 2
  ict_head_size ................................... None
  ict_load ........................................ None
  img_dim ......................................... 224
  indexer_batch_size .............................. 128
  indexer_log_interval ............................ 1000
  init_method_std ................................. 0.02
  init_method_xavier_uniform ...................... False
  initial_loss_scale .............................. 4294967296
  kv_channels ..................................... 128
  layernorm_epsilon ............................... 1e-05
  lazy_mpu_init ................................... None
  load ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
  local_rank ...................................... 0
  log_batch_size_to_tensorboard ................... True
  log_interval .................................... 200
  log_learning_rate_to_tensorboard ................ True
  log_loss_scale_to_tensorboard ................... True
  log_num_zeros_in_grad ........................... False
  log_params_norm ................................. False
  log_timers_to_tensorboard ....................... True
  log_validation_ppl_to_tensorboard ............... True
  loss_on_targets_only ............................ False
  loss_scale ...................................... 12.0
  loss_scale_window ............................... 1000
  lr .............................................. 0.0002
  lr_decay_iters .................................. None
  lr_decay_samples ................................ 73242187
  lr_decay_style .................................. cosine
  lr_warmup_fraction .............................. None
  lr_warmup_iters ................................. 0
  lr_warmup_samples ............................... 183105
  make_vocab_size_divisible_by .................... 128
  mask_prob ....................................... 0.15
  masked_softmax_fusion ........................... True
  max_position_embeddings ......................... None
  memory_centric_tiled_linear ..................... False
  merge_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-merges.txt
  micro_batch_size ................................ 8
  min_loss_scale .................................. 1.0
  min_lr .......................................... 1e-05
  mmap_warmup ..................................... False
  no_load_optim ................................... None
  no_load_rng ..................................... None
  no_save_optim ................................... None
  no_save_rng ..................................... None
  no_train ........................................ None
  num_attention_heads ............................. 16
  num_channels .................................... 3
  num_classes ..................................... 1000
  num_layers ...................................... 24
  num_layers_per_virtual_pipeline_stage ........... None
  num_workers ..................................... 2
  onnx_safe ....................................... None
  openai_gelu ..................................... False
  optimizer ....................................... adam
  override_lr_scheduler ........................... False
  params_dtype .................................... torch.float16
  partition_activations ........................... False
  patch_dim ....................................... 16
  pipeline_model_parallel_size .................... 4
  position_embedding_type ......................... PositionEmbeddingType.rotary
  profile_backward ................................ False
  query_in_block_prob ............................. 0.1
  rampup_batch_size ............................... ['32', '32', '2_000_000']
  rank ............................................ 0
  remote_device ................................... none
  reset_attention_mask ............................ False
  reset_position_ids .............................. False
  retriever_report_topk_accuracies ................ []
  retriever_score_scaling ......................... False
  retriever_seq_length ............................ 256
  sample_rate ..................................... 1.0
  save ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
  save_interval ................................... 1500
  scatter_gather_tensors_in_pipeline .............. True
  scattered_embeddings ............................ False
  seed ............................................ 1234
  seq_length ...................................... 2048
  sgd_momentum .................................... 0.9
  short_seq_prob .................................. 0.1
  split ........................................... 949,50,1
  split_transformers .............................. False
  synchronize_each_layer .......................... False
  tensor_model_parallel_size ...................... 4
  tensorboard_dir ................................. /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/tr4c-1B3-rotary-oscar-logs/tensorboard
  tensorboard_log_interval ........................ 1
  tensorboard_queue_size .......................... 5
  tile_factor ..................................... 1
  titles_data_path ................................ None
  tokenizer_name_or_path .......................... None
  tokenizer_type .................................. GPT2BPETokenizer
  train_iters ..................................... None
  train_samples ................................... 73242187
  use_checkpoint_lr_scheduler ..................... False
  use_contiguous_buffers_in_ddp ................... False
  use_cpu_initialization .......................... None
  use_one_sent_docs ............................... False
  use_pin_memory .................................. False
  virtual_pipeline_model_parallel_size ............ None
  vocab_extra_ids ................................. 0
  vocab_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-vocab.json
  weight_decay .................................... 0.1
  world_size ...................................... 64
  zero_allgather_bucket_size ...................... 0.0
  zero_contigious_gradients ....................... False
  zero_reduce_bucket_size ......................... 0.0
  zero_reduce_scatter ............................. False
  zero_stage ...................................... 1
-------------------- end of arguments ---------------------
will use batch size rampup starting from global batch size 32 to global batch size 512 with batch size increments 32 over 2000000 samples.
> building GPT2BPETokenizer tokenizer ...
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m .......[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. [92m[OKAY][0m

utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m .......async_io [92m[OKAY][0m 
............... [93m[NO][0m ....... --------------------------------------------------[93m[NO][0m

transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
 > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688)
> initializing torch distributed ...
----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report

--------------------------------------------------DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninja
JIT compiled ops requires ninja


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

DeepSpeed general environment info:
DeepSpeed general environment info:torch install path
DeepSpeed general environment info: ...............
 torch install path ............... torch install path ...............['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version ....................torch version ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 1.8.1
....................
 1.8.1torch versiontorch cuda version
  ...................................torch cuda version   1.8.111.1...............

 11.1nvcc versiontorch cuda version
  nvcc version....................................   .....................11.211.1 

11.2deepspeed install pathnvcc version
  deepspeed install path................................   ...........11.2 ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed install pathdeepspeed info
  ..............................deepspeed info   ...................0.4.2+72ce55a, 72ce55a, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 

0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w.deepspeed info 
 ......deepspeed wheel compiled w....................   torch 1.8, cuda 11.1......0.4.2+72ce55a, 72ce55a, big-science
 
torch 1.8, cuda 11.1deepspeed wheel compiled w.
 ...... torch 1.8, cuda 11.1
ninjaninjaninjaninja   .................. .................................... ..................[92m[OKAY][0m  [92m[OKAY][0m [92m[OKAY][0m


[92m[OKAY][0m----------------------------------------------------------------------------------------------------
--------------------------------------------------


--------------------------------------------------op nameop name
 op name ................op name ................  installed .................. ................  installed installedcompatible installed 
.. .. --------------------------------------------------.. compatible
 
compatiblecompatible--------------------------------------------------

--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
cpu_adam
 ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adamcpu_adam cpu_adam.............   ...............[93m[NO][0m...............  fused_adam [92m[YES][0m....... [92m[YES][0m ............. [92m[OKAY][0m ...... 
 ......[93m[NO][0m[92m[OKAY][0m  fused_lamb[92m[OKAY][0m
 .......
.............  [93m[NO][0m[92m[OKAY][0m 
....... fused_adamfused_lamb[92m[OKAY][0m 
 .......................... fused_adam  [93m[NO][0m[93m[NO][0m.............  ..............  [92m[OKAY][0m[92m[OKAY][0m

sparse_attn  [93m[NO][0m............  fused_lamb[93m[NO][0m.......   .................... [92m[OKAY][0m 
[93m[NO][0m[92m[OKAY][0msparse_attn 
.......transformer   ........................fused_lamb [92m[OKAY][0m  [93m[NO][0m
[93m[NO][0m  .............. ............. [92m[OKAY][0m [92m[OKAY][0m
[93m[NO][0m
sparse_attn  transformer ............ [93m[NO][0m.......stochastic_transformer  ............. .......[92m[OKAY][0m   
[93m[NO][0m[93m[NO][0m[92m[OKAY][0m  
..............  [92m[OKAY][0m[92m[OKAY][0mstochastic_transformer

 .transformer [93m[NO][0m  ...................sparse_attn   [93m[NO][0m[92m[OKAY][0m
............  .......[93m[NO][0m  .......[92m[OKAY][0m 
[92m[OKAY][0mstochastic_transformer 
. [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io...............  [93m[NO][0m...............  .......[93m[NO][0m  [93m[NO][0m.......
 [93m[NO][0m
transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version torch version....................  ....................1.8.1 
1.8.1
torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info
 deepspeed info ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
DeepSpeed general environment info:torch install path
 ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ....................
 1.8.1
torch versiontorch cuda version  ...................................  1.8.111.1

nvcc versiontorch cuda version  ....................................  11.211.1

deepspeed install pathnvcc version  ................................  11.2['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed install pathdeepspeed info  ..............................  0.4.2+72ce55a, 72ce55a, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed wheel compiled w.deepspeed info  .........................  torch 1.8, cuda 11.10.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
> setting tensorboard ...
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
> initializing tensor model parallel with size 4
> initializing pipeline model parallel with size 4
> setting random seeds to 1234 ...
[2021-10-02 18:53:41,535] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
> compiling dataset index builder ...
make: Entering directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data'
make: Nothing to be done for 'default'.
make: Leaving directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data'
>>> done with dataset index builder. Compilation time: 0.324 seconds
> compiling and loading fused kernels ...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_upper_triang_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_upper_triang_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module fused_mix_prec_layer_norm_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_mix_prec_layer_norm_cuda...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
>>> done with compiling and loading fused kernels. Compilation time: 25.079 seconds
time to initialize megatron (seconds): 46.989
[after megatron is initialized] datetime: 2021-10-02 18:54:06 
building GPT model ...
[2021-10-02 18:54:07,078] [INFO] [utils.py:680:see_memory_usage] Before Building Model
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved
  warnings.warn(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved
  warnings.warn(
[2021-10-02 18:54:07,081] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB         Max_MA 0.0 GB         CA 0.0 GB         Max_CA 0 GB 
[2021-10-02 18:54:07,081] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory:  used = 39.34 GB, percent = 21.0%
SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None
Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=1, data=0, model=0): 16, ProcessCoord(pipe=1, data=0, model=1): 17, ProcessCoord(pipe=1, data=0, model=2): 18, ProcessCoord(pipe=1, data=0, model=3): 19, ProcessCoord(pipe=1, data=1, model=0): 20, ProcessCoord(pipe=1, data=1, model=1): 21, ProcessCoord(pipe=1, data=1, model=2): 22, ProcessCoord(pipe=1, data=1, model=3): 23, ProcessCoord(pipe=1, data=2, model=0): 24, ProcessCoord(pipe=1, data=2, model=1): 25, ProcessCoord(pipe=1, data=2, model=2): 26, ProcessCoord(pipe=1, data=2, model=3): 27, ProcessCoord(pipe=1, data=3, model=0): 28, ProcessCoord(pipe=1, data=3, model=1): 29, ProcessCoord(pipe=1, data=3, model=2): 30, ProcessCoord(pipe=1, data=3, model=3): 31, ProcessCoord(pipe=2, data=0, model=0): 32, ProcessCoord(pipe=2, data=0, model=1): 33, ProcessCoord(pipe=2, data=0, model=2): 34, ProcessCoord(pipe=2, data=0, model=3): 35, ProcessCoord(pipe=2, data=1, model=0): 36, ProcessCoord(pipe=2, data=1, model=1): 37, ProcessCoord(pipe=2, data=1, model=2): 38, ProcessCoord(pipe=2, data=1, model=3): 39, ProcessCoord(pipe=2, data=2, model=0): 40, ProcessCoord(pipe=2, data=2, model=1): 41, ProcessCoord(pipe=2, data=2, model=2): 42, ProcessCoord(pipe=2, data=2, model=3): 43, ProcessCoord(pipe=2, data=3, model=0): 44, ProcessCoord(pipe=2, data=3, model=1): 45, ProcessCoord(pipe=2, data=3, model=2): 46, ProcessCoord(pipe=2, data=3, model=3): 47, ProcessCoord(pipe=3, data=0, model=0): 48, ProcessCoord(pipe=3, data=0, model=1): 49, ProcessCoord(pipe=3, data=0, model=2): 50, ProcessCoord(pipe=3, data=0, model=3): 51, ProcessCoord(pipe=3, data=1, model=0): 52, ProcessCoord(pipe=3, data=1, model=1): 53, ProcessCoord(pipe=3, data=1, model=2): 54, ProcessCoord(pipe=3, data=1, model=3): 55, ProcessCoord(pipe=3, data=2, model=0): 56, ProcessCoord(pipe=3, data=2, model=1): 57, ProcessCoord(pipe=3, data=2, model=2): 58, ProcessCoord(pipe=3, data=2, model=3): 59, ProcessCoord(pipe=3, data=3, model=0): 60, ProcessCoord(pipe=3, data=3, model=1): 61, ProcessCoord(pipe=3, data=3, model=2): 62, ProcessCoord(pipe=3, data=3, model=3): 63}
[2021-10-02 18:54:07,603] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer
stage=0 layers=9
     0: _to_float16
     1: EmbeddingPipe
     2: <lambda>
     3: ParallelTransformerLayerPipe
     4: ParallelTransformerLayerPipe
     5: ParallelTransformerLayerPipe
     6: ParallelTransformerLayerPipe
     7: ParallelTransformerLayerPipe
     8: ParallelTransformerLayerPipe
stage=1 layers=6
     9: ParallelTransformerLayerPipe
    10: ParallelTransformerLayerPipe
    11: ParallelTransformerLayerPipe
    12: ParallelTransformerLayerPipe
    13: ParallelTransformerLayerPipe
    14: ParallelTransformerLayerPipe
stage=2 layers=6
    15: ParallelTransformerLayerPipe
    16: ParallelTransformerLayerPipe
    17: ParallelTransformerLayerPipe
    18: ParallelTransformerLayerPipe
    19: ParallelTransformerLayerPipe
    20: ParallelTransformerLayerPipe
stage=3 layers=10
    21: ParallelTransformerLayerPipe
    22: ParallelTransformerLayerPipe
    23: ParallelTransformerLayerPipe
    24: ParallelTransformerLayerPipe
    25: ParallelTransformerLayerPipe
    26: ParallelTransformerLayerPipe
    27: <lambda>
    28: MixedFusedLayerNorm
    29: EmbeddingPipe
    30: float16_to_fp32
  loss: CrossEntropy
 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 75592704

 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 101549056
 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 101549056
 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 101549056
 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 101544960 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 101544960

 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 101549056
 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 101544960
[2021-10-02 18:54:07,976] [INFO] [utils.py:680:see_memory_usage] After Building Model
[2021-10-02 18:54:07,977] [INFO] [utils.py:681:see_memory_usage] MA 0.2 GB         Max_MA 0.22 GB         CA 0.24 GB         Max_CA 0 GB 
[2021-10-02 18:54:07,977] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory:  used = 39.73 GB, percent = 21.2%
 > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 101544960
setting training iterations to 152972
> learning rate decay style: cosine
DeepSpeed is enabled.
[2021-10-02 18:54:08,006] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+72ce55a, git-hash=72ce55a, git-branch=big-science
[2021-10-02 18:54:08,074] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False
[2021-10-02 18:54:08,074] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer
[2021-10-02 18:54:08,074] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer
[2021-10-02 18:54:08,074] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam
[2021-10-02 18:54:08,074] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type=<class 'apex.optimizers.fused_adam.FusedAdam'>
[2021-10-02 18:54:08,074] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer
[2021-10-02 18:54:08,074] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000
[2021-10-02 18:54:08,074] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000
[2021-10-02 18:54:08,074] [INFO] [stage2.py:108:__init__] CPU Offload: False
[2021-10-02 18:54:08,074] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False
[2021-10-02 18:54:08,307] [INFO] [stage2.py:419:__init__] optimizer state initialized
[2021-10-02 18:54:08,307] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam
[2021-10-02 18:54:08,307] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler
[2021-10-02 18:54:08,307] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = <megatron.learning_rates.AnnealingLR object at 0x14ec9a083c70>
[2021-10-02 18:54:08,307] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)]
[2021-10-02 18:54:08,307] [INFO] [config.py:900:print] DeepSpeedEngine configuration:
[2021-10-02 18:54:08,308] [INFO] [config.py:904:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2021-10-02 18:54:08,308] [INFO] [config.py:904:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2021-10-02 18:54:08,308] [INFO] [config.py:904:print]   allreduce_always_fp32 ........ False
[2021-10-02 18:54:08,308] [INFO] [config.py:904:print]   amp_enabled .................. False
[2021-10-02 18:54:08,308] [INFO] [config.py:904:print]   amp_params ................... False
[2021-10-02 18:54:08,308] [INFO] [config.py:904:print]   checkpoint_tag_validation_enabled  True
[2021-10-02 18:54:08,308] [INFO] [config.py:904:print]   checkpoint_tag_validation_fail  False
[2021-10-02 18:54:08,308] [INFO] [config.py:904:print]   disable_allgather ............ False
[2021-10-02 18:54:08,308] [INFO] [config.py:904:print]   dump_state ................... False
[2021-10-02 18:54:08,308] [INFO] [config.py:904:print]   dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1}
[2021-10-02 18:54:08,308] [INFO] [config.py:904:print]   eigenvalue_enabled ........... False
[2021-10-02 18:54:08,308] [INFO] [config.py:904:print]   eigenvalue_gas_boundary_resolution  1
[2021-10-02 18:54:08,308] [INFO] [config.py:904:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2021-10-02 18:54:08,308] [INFO] [config.py:904:print]   eigenvalue_layer_num ......... 0
[2021-10-02 18:54:08,308] [INFO] [config.py:904:print]   eigenvalue_max_iter .......... 100
[2021-10-02 18:54:08,308] [INFO] [config.py:904:print]   eigenvalue_stability ......... 1e-06
[2021-10-02 18:54:08,308] [INFO] [config.py:904:print]   eigenvalue_tol ............... 0.01
[2021-10-02 18:54:08,308] [INFO] [config.py:904:print]   eigenvalue_verbose ........... False
[2021-10-02 18:54:08,308] [INFO] [config.py:904:print]   elasticity_enabled ........... False
[2021-10-02 18:54:08,308] [INFO] [config.py:904:print]   flops_profiler_config ........ {
    "enabled": false, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2021-10-02 18:54:08,308] [INFO] [config.py:904:print]   fp16_enabled ................. True
[2021-10-02 18:54:08,308] [INFO] [config.py:904:print]   fp16_mixed_quantize .......... False
[2021-10-02 18:54:08,308] [INFO] [config.py:904:print]   global_rank .................. 0
[2021-10-02 18:54:08,308] [INFO] [config.py:904:print]   gradient_accumulation_steps .. 16
[2021-10-02 18:54:08,308] [INFO] [config.py:904:print]   gradient_clipping ............ 1.0
[2021-10-02 18:54:08,308] [INFO] [config.py:904:print]   gradient_predivide_factor .... 1.0
[2021-10-02 18:54:08,308] [INFO] [config.py:904:print]   initial_dynamic_scale ........ 4096
[2021-10-02 18:54:08,309] [INFO] [config.py:904:print]   loss_scale ................... 0
[2021-10-02 18:54:08,309] [INFO] [config.py:904:print]   memory_breakdown ............. False
[2021-10-02 18:54:08,309] [INFO] [config.py:904:print]   optimizer_legacy_fusion ...... False
[2021-10-02 18:54:08,309] [INFO] [config.py:904:print]   optimizer_name ............... None
[2021-10-02 18:54:08,309] [INFO] [config.py:904:print]   optimizer_params ............. None
[2021-10-02 18:54:08,309] [INFO] [config.py:904:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2021-10-02 18:54:08,309] [INFO] [config.py:904:print]   pld_enabled .................. False
[2021-10-02 18:54:08,309] [INFO] [config.py:904:print]   pld_params ................... False
[2021-10-02 18:54:08,309] [INFO] [config.py:904:print]   prescale_gradients ........... False
[2021-10-02 18:54:08,309] [INFO] [config.py:904:print]   quantize_change_rate ......... 0.001
[2021-10-02 18:54:08,309] [INFO] [config.py:904:print]   quantize_groups .............. 1
[2021-10-02 18:54:08,309] [INFO] [config.py:904:print]   quantize_offset .............. 1000
[2021-10-02 18:54:08,309] [INFO] [config.py:904:print]   quantize_period .............. 1000
[2021-10-02 18:54:08,309] [INFO] [config.py:904:print]   quantize_rounding ............ 0
[2021-10-02 18:54:08,309] [INFO] [config.py:904:print]   quantize_start_bits .......... 16
[2021-10-02 18:54:08,309] [INFO] [config.py:904:print]   quantize_target_bits ......... 8
[2021-10-02 18:54:08,309] [INFO] [config.py:904:print]   quantize_training_enabled .... False
[2021-10-02 18:54:08,309] [INFO] [config.py:904:print]   quantize_type ................ 0
[2021-10-02 18:54:08,309] [INFO] [config.py:904:print]   quantize_verbose ............. False
[2021-10-02 18:54:08,309] [INFO] [config.py:904:print]   scheduler_name ............... None
[2021-10-02 18:54:08,309] [INFO] [config.py:904:print]   scheduler_params ............. None
[2021-10-02 18:54:08,309] [INFO] [config.py:904:print]   sparse_attention ............. None
[2021-10-02 18:54:08,309] [INFO] [config.py:904:print]   sparse_gradients_enabled ..... False
[2021-10-02 18:54:08,309] [INFO] [config.py:904:print]   steps_per_print .............. 2000
[2021-10-02 18:54:08,309] [INFO] [config.py:904:print]   tensorboard_enabled .......... False
[2021-10-02 18:54:08,309] [INFO] [config.py:904:print]   tensorboard_job_name ......... DeepSpeedJobName
[2021-10-02 18:54:08,309] [INFO] [config.py:904:print]   tensorboard_output_path ...... 
[2021-10-02 18:54:08,309] [INFO] [config.py:904:print]   train_batch_size ............. 512
[2021-10-02 18:54:08,309] [INFO] [config.py:904:print]   train_micro_batch_size_per_gpu  8
[2021-10-02 18:54:08,309] [INFO] [config.py:904:print]   use_quantizer_kernel ......... False
[2021-10-02 18:54:08,309] [INFO] [config.py:904:print]   wall_clock_breakdown ......... False
[2021-10-02 18:54:08,309] [INFO] [config.py:904:print]   world_size ................... 4
[2021-10-02 18:54:08,309] [INFO] [config.py:904:print]   zero_allow_untested_optimizer  False
[2021-10-02 18:54:08,309] [INFO] [config.py:904:print]   zero_config .................. {
    "stage": 1, 
    "contiguous_gradients": false, 
    "reduce_scatter": true, 
    "reduce_bucket_size": 5.000000e+08, 
    "allgather_partitions": true, 
    "allgather_bucket_size": 5.000000e+08, 
    "overlap_comm": false, 
    "load_from_fp32_weights": true, 
    "elastic_checkpoint": true, 
    "offload_param": null, 
    "offload_optimizer": null, 
    "sub_group_size": 1.000000e+09, 
    "prefetch_bucket_size": 5.000000e+07, 
    "param_persistence_threshold": 1.000000e+05, 
    "max_live_parameters": 1.000000e+09, 
    "max_reuse_distance": 1.000000e+09, 
    "gather_fp16_weights_on_model_save": false, 
    "ignore_unused_parameters": true, 
    "round_robin_gradients": false, 
    "legacy_stage1": false
}
[2021-10-02 18:54:08,309] [INFO] [config.py:904:print]   zero_enabled ................. True
[2021-10-02 18:54:08,309] [INFO] [config.py:904:print]   zero_optimization_stage ...... 1
[2021-10-02 18:54:08,310] [INFO] [config.py:906:print]   json = {
    "train_micro_batch_size_per_gpu": 8, 
    "train_batch_size": 512, 
    "gradient_clipping": 1.0, 
    "zero_optimization": {
        "stage": 1
    }, 
    "fp16": {
        "enabled": true, 
        "loss_scale": 0, 
        "loss_scale_window": 500, 
        "hysteresis": 2, 
        "min_loss_scale": 1, 
        "initial_scale_power": 12
    }, 
    "steps_per_print": 2.000000e+03, 
    "wall_clock_breakdown": false
}
[2021-10-02 18:54:08,310] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=16 micro_batch_size=8
[2021-10-02 18:54:08,600] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-02 18:54:08,600] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-02 18:54:08,600] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-02 18:54:08,600] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-02 18:54:08,600] [INFO] [engine.py:134:__init__] RANK=48 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-02 18:54:08,600] [INFO] [engine.py:134:__init__] RANK=49 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-02 18:54:08,600] [INFO] [engine.py:134:__init__] RANK=51 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-02 18:54:08,600] [INFO] [engine.py:134:__init__] RANK=50 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-02 18:54:08,600] [INFO] [engine.py:134:__init__] RANK=33 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-02 18:54:08,600] [INFO] [engine.py:134:__init__] RANK=35 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-02 18:54:08,600] [INFO] [engine.py:134:__init__] RANK=34 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-02 18:54:08,600] [INFO] [engine.py:134:__init__] RANK=32 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-02 18:54:08,600] [INFO] [engine.py:134:__init__] RANK=17 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-02 18:54:08,600] [INFO] [engine.py:134:__init__] RANK=16 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-02 18:54:08,600] [INFO] [engine.py:134:__init__] RANK=18 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-02 18:54:08,600] [INFO] [engine.py:134:__init__] RANK=19 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
 > using checkpoint value 0.0002 for learning rate
 > using checkpoint value 1e-05 for minimum learning rate
 > using checkpoint value 183105 for warmup iterations
 > using checkpoint value 73242187 for total number of iterations
 > using checkpoint value cosine for decay style
successfully loaded 4 ZeRO state_dicts for rank 34
successfully loaded 4 ZeRO state_dicts for rank 36
successfully loaded 4 ZeRO state_dicts for rank 42
successfully loaded 4 ZeRO state_dicts for rank 38
successfully loaded 4 ZeRO state_dicts for rank 23
successfully loaded 4 ZeRO state_dicts for rank 33
successfully loaded 4 ZeRO state_dicts for rank 46
successfully loaded 4 ZeRO state_dicts for rank 45
successfully loaded 4 ZeRO state_dicts for rank 27
successfully loaded 4 ZeRO state_dicts for rank 41
successfully loaded 4 ZeRO state_dicts for rank 26
successfully loaded 4 ZeRO state_dicts for rank 35
successfully loaded 4 ZeRO state_dicts for rank 43
successfully loaded 4 ZeRO state_dicts for rank 18
successfully loaded 4 ZeRO state_dicts for rank 22
successfully loaded 4 ZeRO state_dicts for rank 40
successfully loaded 4 ZeRO state_dicts for rank 31
successfully loaded 4 ZeRO state_dicts for rank 37
successfully loaded 4 ZeRO state_dicts for rank 44
successfully loaded 4 ZeRO state_dicts for rank 19
successfully loaded 4 ZeRO state_dicts for rank 55
successfully loaded 4 ZeRO state_dicts for rank 32
successfully loaded 4 ZeRO state_dicts for rank 39
successfully loaded 4 ZeRO state_dicts for rank 47
successfully loaded 4 ZeRO state_dicts for rank 30
successfully loaded 4 ZeRO state_dicts for rank 20
successfully loaded 4 ZeRO state_dicts for rank 21
successfully loaded 4 ZeRO state_dicts for rank 29
successfully loaded 4 ZeRO state_dicts for rank 61
successfully loaded 4 ZeRO state_dicts for rank 28
successfully loaded 4 ZeRO state_dicts for rank 17
successfully loaded 4 ZeRO state_dicts for rank 25
successfully loaded 4 ZeRO state_dicts for rank 51
successfully loaded 4 ZeRO state_dicts for rank 52
successfully loaded 4 ZeRO state_dicts for rank 60
successfully loaded 4 ZeRO state_dicts for rank 24
successfully loaded 4 ZeRO state_dicts for rank 9
successfully loaded 4 ZeRO state_dicts for rank 16
successfully loaded 4 ZeRO state_dicts for rank 54
successfully loaded 4 ZeRO state_dicts for rank 59
successfully loaded 4 ZeRO state_dicts for rank 13
successfully loaded 4 ZeRO state_dicts for rank 49
successfully loaded 4 ZeRO state_dicts for rank 50
successfully loaded 4 ZeRO state_dicts for rank 10
successfully loaded 4 ZeRO state_dicts for rank 14
successfully loaded 4 ZeRO state_dicts for rank 63
successfully loaded 4 ZeRO state_dicts for rank 2
loading 4 zero partition checkpoints for rank 34
successfully loaded 4 ZeRO state_dicts for rank 53
loading 4 zero partition checkpoints for rank 36
successfully loaded 4 ZeRO state_dicts for rank 48
successfully loaded 4 ZeRO state_dicts for rank 6
successfully loaded 4 ZeRO state_dicts for rank 1
successfully loaded 4 ZeRO state_dicts for rank 62
successfully loaded 4 ZeRO state_dicts for rank 3
successfully loaded 4 ZeRO state_dicts for rank 5
loading 4 zero partition checkpoints for rank 42
loading 4 zero partition checkpoints for rank 23
successfully loaded 4 ZeRO state_dicts for rank 56
successfully loaded 4 ZeRO state_dicts for rank 11
loading 4 zero partition checkpoints for rank 38
successfully loaded 4 ZeRO state_dicts for rank 8
successfully loaded 4 ZeRO state_dicts for rank 15
loading 4 zero partition checkpoints for rank 26
successfully loaded 4 ZeRO state_dicts for rank 7
loading 4 zero partition checkpoints for rank 33
successfully loaded 4 ZeRO state_dicts for rank 57
successfully loaded 4 ZeRO state_dicts for rank 58
loading 4 zero partition checkpoints for rank 27
loading 4 zero partition checkpoints for rank 46loading 4 zero partition checkpoints for rank 45

successfully loaded 4 ZeRO state_dicts for rank 0
loading 4 zero partition checkpoints for rank 41
loading 4 zero partition checkpoints for rank 40
successfully loaded 4 ZeRO state_dicts for rank 4
loading 4 zero partition checkpoints for rank 35
successfully loaded 4 ZeRO state_dicts for rank 12
loading 4 zero partition checkpoints for rank 43
loading 4 zero partition checkpoints for rank 22
loading 4 zero partition checkpoints for rank 44
loading 4 zero partition checkpoints for rank 37
loading 4 zero partition checkpoints for rank 18
loading 4 zero partition checkpoints for rank 19
loading 4 zero partition checkpoints for rank 31
loading 4 zero partition checkpoints for rank 32
loading 4 zero partition checkpoints for rank 39
loading 4 zero partition checkpoints for rank 47
loading 4 zero partition checkpoints for rank 30
loading 4 zero partition checkpoints for rank 28
loading 4 zero partition checkpoints for rank 29
loading 4 zero partition checkpoints for rank 20
loading 4 zero partition checkpoints for rank 21
loading 4 zero partition checkpoints for rank 17
loading 4 zero partition checkpoints for rank 25
loading 4 zero partition checkpoints for rank 24
loading 4 zero partition checkpoints for rank 16
loading 4 zero partition checkpoints for rank 55
loading 4 zero partition checkpoints for rank 61
loading 4 zero partition checkpoints for rank 51
loading 4 zero partition checkpoints for rank 60
loading 4 zero partition checkpoints for rank 52
loading 4 zero partition checkpoints for rank 9
loading 4 zero partition checkpoints for rank 54
loading 4 zero partition checkpoints for rank 14
loading 4 zero partition checkpoints for rank 63
loading 4 zero partition checkpoints for rank 13
loading 4 zero partition checkpoints for rank 50
loading 4 zero partition checkpoints for rank 53
loading 4 zero partition checkpoints for rank 49
loading 4 zero partition checkpoints for rank 59
loading 4 zero partition checkpoints for rank 10
loading 4 zero partition checkpoints for rank 48
loading 4 zero partition checkpoints for rank 3
loading 4 zero partition checkpoints for rank 2
loading 4 zero partition checkpoints for rank 62
loading 4 zero partition checkpoints for rank 1
loading 4 zero partition checkpoints for rank 6
loading 4 zero partition checkpoints for rank 8
loading 4 zero partition checkpoints for rank 5
loading 4 zero partition checkpoints for rank 56
loading 4 zero partition checkpoints for rank 11
loading 4 zero partition checkpoints for rank 7
loading 4 zero partition checkpoints for rank 57
loading 4 zero partition checkpoints for rank 0
loading 4 zero partition checkpoints for rank 15
 checkpoint version 3.0
loading 4 zero partition checkpoints for rank 58
loading 4 zero partition checkpoints for rank 4
loading 4 zero partition checkpoints for rank 12
  successfully loaded checkpoint from /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints at iteration 76657
time (ms) | load-checkpoint: 2107.03
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264estimated model parameters: 1.209483264

estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.62471936estimated model parameters: 1.62471936

/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.624784896
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.62471936
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.624784896estimated model parameters: 1.624784896
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")

estimated model parameters: 1.62471936
estimated model parameters: 1.624784896
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
estimated model parameters: 1.62471936
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.62471936
estimated model parameters: 1.62471936
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.62471936
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
estimated model parameters: 1.624784896
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.624784896
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters: 1.62471936estimated model parameters: 1.62471936

estimated model parameters without embeddings: 1.2095488
estimated model parameters: 1.62471936
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.2095488
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.62471936
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters: 1.62471936
estimated model parameters: 1.62471936
estimated model parameters: 1.62471936estimated model parameters: 1.62471936

estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.209483264estimated model parameters: 1.209483264

estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264


estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.209483264estimated model parameters: 1.209483264

estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters: 1.209483264
estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488


estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264estimated model parameters: 1.209483264

estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters: 1.209483264estimated model parameters: 1.209483264

estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
[after model, optimizer, and learning rate scheduler are built] datetime: 2021-10-02 18:54:10 
> building train, validation, and test datasets ...
 > datasets target sizes (minimum size):
    train:      73242187
    validation: 7833600
    test:       51200
> building train, validation, and test datasets for GPT ...
 > building dataset index ...
    reading sizes...
    reading pointers...
    reading document index...
    creating numpy buffer of mmap...
    creating memory view of numpy buffer...
 > finished creating indexed dataset in 4.935714 seconds
    number of documents: 304230423
 > dataset split:
    train:
     document indices in [0, 288714672) total of 288714672 documents
    validation:
     document indices in [288714672, 303926193) total of 15211521 documents
    test:
     document indices in [303926193, 304230423) total of 304230 documents
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.151 seconds
    total number of samples: 131537224
    total number of epochs: 1
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.218 seconds
    total number of samples: 13854322
    total number of epochs: 2
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.068 seconds
    total number of samples: 137384
    total number of epochs: 1
> finished creating GPT datasets ...
[after dataloaders are built] datetime: 2021-10-02 18:54:21 
done with setup ...
training ...
time (ms) | model-and-optimizer-setup: 3819.48 | train/valid/test-data-iterators-setup: 10087.27
Number of parameters: 1.62471936 billionNumber of parameters: 1.62471936 billion

Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion

Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billion

Number of parameters: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion

Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion

Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion

Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion

Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion

Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billionNumber of parameters without embeddings: 1.2095488 billion

Number of parameters: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters: 1.624784896 billion
Number of parameters without embeddings: 1.2095488 billion

Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.62471936 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters: 1.62471936 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
[before the start of training step] datetime: 2021-10-02 18:54:21 
[2021-10-02 18:54:21,721] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information
[2021-10-02 18:54:21,721] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False
[2021-10-02 18:54:21,721] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 24 total layers
[2021-10-02 18:54:21,721] [INFO] [checkpointing.py:415:forward] ----Synchronization False
[2021-10-02 18:54:21,721] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False
[Rank 17] (after 76800 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4732.0 | max reserved: 4732.0
[Rank 33] (after 76800 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4348.0 | max reserved: 4348.0
[Rank 1] (after 76800 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.45751953125 | reserved: 5318.0 | max reserved: 5318.0
[Rank 49] (after 76800 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7326.0 | max reserved: 7326.0
[Rank 18] (after 76800 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4828.0 | max reserved: 4828.0
[Rank 2] (after 76800 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.95751953125 | reserved: 5510.0 | max reserved: 5510.0
[Rank 34] (after 76800 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4220.0 | max reserved: 4220.0
[Rank 50] (after 76800 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7198.0 | max reserved: 7198.0
[Rank 35] (after 76800 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4348.0 | max reserved: 4348.0
[Rank 19] (after 76800 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4764.0 | max reserved: 4764.0
[Rank 3] (after 76800 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.95751953125 | reserved: 5510.0 | max reserved: 5510.0
[Rank 51] (after 76800 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7166.0 | max reserved: 7166.0
[Rank 48] (after 76800 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 6694.0 | max reserved: 6694.0
 iteration    76800/  152972 | consumed samples:     34241984 | elapsed time per iteration (ms): 6177.8 | learning rate: 1.154E-04 | global batch size:   512 | lm loss: 2.832041E+00 | loss scale: 1048576.0 | grad norm: 90501.661 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
[Rank 16] (after 76800 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4732.0 | max reserved: 4732.0
[Rank 32] (after 76800 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4268.0 | max reserved: 4268.0
[Rank 0] (after 76800 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.95751953125 | reserved: 5382.0 | max reserved: 5382.0
time (ms)
 iteration    77000/  152972 | consumed samples:     34344384 | elapsed time per iteration (ms): 6112.0 | learning rate: 1.150E-04 | global batch size:   512 | lm loss: 2.827650E+00 | loss scale: 524288.0 | grad norm: 45092.852 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 77000 | lm loss value: 2.772668E+00 | lm loss PPL: 1.600127E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    77200/  152972 | consumed samples:     34446784 | elapsed time per iteration (ms): 6968.0 | learning rate: 1.146E-04 | global batch size:   512 | lm loss: 2.833599E+00 | loss scale: 524288.0 | grad norm: 45544.579 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    77400/  152972 | consumed samples:     34549184 | elapsed time per iteration (ms): 6086.8 | learning rate: 1.141E-04 | global batch size:   512 | lm loss: 2.828930E+00 | loss scale: 524288.0 | grad norm: 45895.461 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    77600/  152972 | consumed samples:     34651584 | elapsed time per iteration (ms): 6092.3 | learning rate: 1.137E-04 | global batch size:   512 | lm loss: 2.828153E+00 | loss scale: 1048576.0 | grad norm: 94942.364 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    77800/  152972 | consumed samples:     34753984 | elapsed time per iteration (ms): 6112.9 | learning rate: 1.133E-04 | global batch size:   512 | lm loss: 2.830373E+00 | loss scale: 524288.0 | grad norm: 48560.782 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-10-02 21:13:59,934] [INFO] [logging.py:68:log_dist] [Rank 0] step=78000, skipped=168, lr=[0.00011288825017492884, 0.00011288825017492884], mom=[(0.9, 0.999), (0.9, 0.999)]
 iteration    78000/  152972 | consumed samples:     34856384 | elapsed time per iteration (ms): 6102.4 | learning rate: 1.129E-04 | global batch size:   512 | lm loss: 2.833396E+00 | loss scale: 524288.0 | grad norm: 53776.099 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
steps: 78000 loss: 2.8409 iter time (s): 0.003 samples/sec: 167722.688
-------------------------------------------------------------------------------------------------
 validation loss at iteration 78000 | lm loss value: 2.778425E+00 | lm loss PPL: 1.609365E+01 | 
-------------------------------------------------------------------------------------------------
saving checkpoint at iteration   78000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-02 21:16:51,709] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step78000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   78000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1528.11
 iteration    78200/  152972 | consumed samples:     34958784 | elapsed time per iteration (ms): 6951.7 | learning rate: 1.125E-04 | global batch size:   512 | lm loss: 2.833708E+00 | loss scale: 262144.0 | grad norm: 24060.905 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    78400/  152972 | consumed samples:     35061184 | elapsed time per iteration (ms): 6083.1 | learning rate: 1.121E-04 | global batch size:   512 | lm loss: 2.833099E+00 | loss scale: 262144.0 | grad norm: 25049.528 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    78600/  152972 | consumed samples:     35163584 | elapsed time per iteration (ms): 6088.4 | learning rate: 1.116E-04 | global batch size:   512 | lm loss: 2.833093E+00 | loss scale: 524288.0 | grad norm: 52096.678 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    78800/  152972 | consumed samples:     35265984 | elapsed time per iteration (ms): 6092.4 | learning rate: 1.112E-04 | global batch size:   512 | lm loss: 2.835002E+00 | loss scale: 262144.0 | grad norm: 29955.130 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    79000/  152972 | consumed samples:     35368384 | elapsed time per iteration (ms): 6093.7 | learning rate: 1.108E-04 | global batch size:   512 | lm loss: 2.834654E+00 | loss scale: 262144.0 | grad norm: 48152.142 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 79000 | lm loss value: 2.785615E+00 | lm loss PPL: 1.620978E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    79200/  152972 | consumed samples:     35470784 | elapsed time per iteration (ms): 6948.7 | learning rate: 1.104E-04 | global batch size:   512 | lm loss: 2.836190E+00 | loss scale: 262144.0 | grad norm: 24593.742 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    79400/  152972 | consumed samples:     35573184 | elapsed time per iteration (ms): 6089.6 | learning rate: 1.100E-04 | global batch size:   512 | lm loss: 2.838174E+00 | loss scale: 524288.0 | grad norm: 52260.800 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration   79500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-02 23:51:58,871] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step79500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   79500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1551.98
 iteration    79600/  152972 | consumed samples:     35675584 | elapsed time per iteration (ms): 6096.9 | learning rate: 1.096E-04 | global batch size:   512 | lm loss: 2.836538E+00 | loss scale: 262144.0 | grad norm: 25102.667 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    79800/  152972 | consumed samples:     35777984 | elapsed time per iteration (ms): 6080.1 | learning rate: 1.091E-04 | global batch size:   512 | lm loss: 2.851142E+00 | loss scale: 32768.0 | grad norm: 3037.800 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-10-03 00:42:39,533] [INFO] [logging.py:68:log_dist] [Rank 0] step=80000, skipped=176, lr=[0.00010873000690755008, 0.00010873000690755008], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 80000 loss: 2.8551 iter time (s): 0.003 samples/sec: 168457.459
 iteration    80000/  152972 | consumed samples:     35880384 | elapsed time per iteration (ms): 6073.4 | learning rate: 1.087E-04 | global batch size:   512 | lm loss: 2.837261E+00 | loss scale: 32768.0 | grad norm: 3266.311 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 80000 | lm loss value: 2.784462E+00 | lm loss PPL: 1.619110E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    80200/  152972 | consumed samples:     35982784 | elapsed time per iteration (ms): 6970.1 | learning rate: 1.083E-04 | global batch size:   512 | lm loss: 2.836796E+00 | loss scale: 65536.0 | grad norm: 6527.870 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    80400/  152972 | consumed samples:     36085184 | elapsed time per iteration (ms): 6086.6 | learning rate: 1.079E-04 | global batch size:   512 | lm loss: 2.832860E+00 | loss scale: 65536.0 | grad norm: 7569.390 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    80600/  152972 | consumed samples:     36187584 | elapsed time per iteration (ms): 6093.4 | learning rate: 1.075E-04 | global batch size:   512 | lm loss: 2.834668E+00 | loss scale: 65536.0 | grad norm: 6189.151 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    80800/  152972 | consumed samples:     36289984 | elapsed time per iteration (ms): 6099.6 | learning rate: 1.071E-04 | global batch size:   512 | lm loss: 2.834516E+00 | loss scale: 131072.0 | grad norm: 12411.810 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    81000/  152972 | consumed samples:     36392384 | elapsed time per iteration (ms): 6102.0 | learning rate: 1.066E-04 | global batch size:   512 | lm loss: 2.832582E+00 | loss scale: 131072.0 | grad norm: 12148.526 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 81000 | lm loss value: 2.787011E+00 | lm loss PPL: 1.623242E+01 | 
-------------------------------------------------------------------------------------------------
saving checkpoint at iteration   81000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-03 02:30:08,320] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step81000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   81000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1475.07
 iteration    81200/  152972 | consumed samples:     36494784 | elapsed time per iteration (ms): 6988.7 | learning rate: 1.062E-04 | global batch size:   512 | lm loss: 2.834314E+00 | loss scale: 262144.0 | grad norm: 23819.228 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    81400/  152972 | consumed samples:     36597184 | elapsed time per iteration (ms): 6094.5 | learning rate: 1.058E-04 | global batch size:   512 | lm loss: 2.832202E+00 | loss scale: 262144.0 | grad norm: 25661.942 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    81600/  152972 | consumed samples:     36699584 | elapsed time per iteration (ms): 6110.8 | learning rate: 1.054E-04 | global batch size:   512 | lm loss: 2.831590E+00 | loss scale: 262144.0 | grad norm: 32265.876 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    81800/  152972 | consumed samples:     36801984 | elapsed time per iteration (ms): 6092.1 | learning rate: 1.050E-04 | global batch size:   512 | lm loss: 2.830193E+00 | loss scale: 524288.0 | grad norm: 48215.924 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-10-03 04:11:47,330] [INFO] [logging.py:68:log_dist] [Rank 0] step=82000, skipped=176, lr=[0.00010454785823469226, 0.00010454785823469226], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 82000 loss: 2.8600 iter time (s): 0.003 samples/sec: 167891.455
 iteration    82000/  152972 | consumed samples:     36904384 | elapsed time per iteration (ms): 6101.2 | learning rate: 1.045E-04 | global batch size:   512 | lm loss: 2.830870E+00 | loss scale: 524288.0 | grad norm: 52488.046 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 82000 | lm loss value: 2.779595E+00 | lm loss PPL: 1.611250E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    82200/  152972 | consumed samples:     37006784 | elapsed time per iteration (ms): 6957.0 | learning rate: 1.041E-04 | global batch size:   512 | lm loss: 2.831504E+00 | loss scale: 1048576.0 | grad norm: 99911.595 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    82400/  152972 | consumed samples:     37109184 | elapsed time per iteration (ms): 6087.0 | learning rate: 1.037E-04 | global batch size:   512 | lm loss: 2.831356E+00 | loss scale: 1048576.0 | grad norm: 95679.794 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration   82500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-03 05:05:24,891] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step82500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   82500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1487.34
 iteration    82600/  152972 | consumed samples:     37211584 | elapsed time per iteration (ms): 6090.4 | learning rate: 1.033E-04 | global batch size:   512 | lm loss: 2.829901E+00 | loss scale: 524288.0 | grad norm: 49833.089 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    82800/  152972 | consumed samples:     37313984 | elapsed time per iteration (ms): 6091.9 | learning rate: 1.029E-04 | global batch size:   512 | lm loss: 2.830859E+00 | loss scale: 524288.0 | grad norm: 52678.112 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    83000/  152972 | consumed samples:     37416384 | elapsed time per iteration (ms): 6095.3 | learning rate: 1.025E-04 | global batch size:   512 | lm loss: 2.828969E+00 | loss scale: 1048576.0 | grad norm: 110760.999 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 83000 | lm loss value: 2.781976E+00 | lm loss PPL: 1.615091E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    83200/  152972 | consumed samples:     37518784 | elapsed time per iteration (ms): 7014.6 | learning rate: 1.020E-04 | global batch size:   512 | lm loss: 2.829301E+00 | loss scale: 524288.0 | grad norm: 51487.594 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    83400/  152972 | consumed samples:     37621184 | elapsed time per iteration (ms): 6091.4 | learning rate: 1.016E-04 | global batch size:   512 | lm loss: 2.831655E+00 | loss scale: 524288.0 | grad norm: 49228.417 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    83600/  152972 | consumed samples:     37723584 | elapsed time per iteration (ms): 6094.4 | learning rate: 1.012E-04 | global batch size:   512 | lm loss: 2.827023E+00 | loss scale: 524288.0 | grad norm: 54728.009 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    83800/  152972 | consumed samples:     37825984 | elapsed time per iteration (ms): 6095.5 | learning rate: 1.008E-04 | global batch size:   512 | lm loss: 2.827507E+00 | loss scale: 524288.0 | grad norm: 51983.602 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-10-03 07:40:49,016] [INFO] [logging.py:68:log_dist] [Rank 0] step=84000, skipped=182, lr=[0.00010037912050310452, 0.00010037912050310452], mom=[(0.9, 0.999), (0.9, 0.999)]
 iteration    84000/  152972 | consumed samples:     37928384 | elapsed time per iteration (ms): 6090.9 | learning rate: 1.004E-04 | global batch size:   512 | lm loss: 2.828560E+00 | loss scale: 524288.0 | grad norm: 49392.992 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
steps: 84000 loss: 2.8478 iter time (s): 0.003 samples/sec: 168112.377
-------------------------------------------------------------------------------------------------
 validation loss at iteration 84000 | lm loss value: 2.782102E+00 | lm loss PPL: 1.615293E+01 | 
-------------------------------------------------------------------------------------------------
saving checkpoint at iteration   84000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-03 07:43:43,572] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step84000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   84000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1581.45
 iteration    84200/  152972 | consumed samples:     38030784 | elapsed time per iteration (ms): 6971.0 | learning rate: 9.996E-05 | global batch size:   512 | lm loss: 2.830886E+00 | loss scale: 1048576.0 | grad norm: 98218.480 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    84400/  152972 | consumed samples:     38133184 | elapsed time per iteration (ms): 6101.4 | learning rate: 9.955E-05 | global batch size:   512 | lm loss: 2.826230E+00 | loss scale: 1048576.0 | grad norm: 98202.392 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    84600/  152972 | consumed samples:     38235584 | elapsed time per iteration (ms): 6092.7 | learning rate: 9.913E-05 | global batch size:   512 | lm loss: 2.827177E+00 | loss scale: 524288.0 | grad norm: 64826.566 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    84800/  152972 | consumed samples:     38337984 | elapsed time per iteration (ms): 6093.0 | learning rate: 9.871E-05 | global batch size:   512 | lm loss: 2.825721E+00 | loss scale: 262144.0 | grad norm: 24805.832 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    85000/  152972 | consumed samples:     38440384 | elapsed time per iteration (ms): 6099.2 | learning rate: 9.830E-05 | global batch size:   512 | lm loss: 2.825966E+00 | loss scale: 262144.0 | grad norm: 29073.945 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 85000 | lm loss value: 2.776467E+00 | lm loss PPL: 1.606218E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    85200/  152972 | consumed samples:     38542784 | elapsed time per iteration (ms): 6994.1 | learning rate: 9.788E-05 | global batch size:   512 | lm loss: 2.827210E+00 | loss scale: 524288.0 | grad norm: 52042.352 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    85400/  152972 | consumed samples:     38645184 | elapsed time per iteration (ms): 6100.1 | learning rate: 9.746E-05 | global batch size:   512 | lm loss: 2.823944E+00 | loss scale: 524288.0 | grad norm: 51673.767 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration   85500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-03 10:19:08,367] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step85500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   85500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1587.52
 iteration    85600/  152972 | consumed samples:     38747584 | elapsed time per iteration (ms): 6093.6 | learning rate: 9.705E-05 | global batch size:   512 | lm loss: 2.827336E+00 | loss scale: 262144.0 | grad norm: 24072.151 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    85800/  152972 | consumed samples:     38849984 | elapsed time per iteration (ms): 6080.6 | learning rate: 9.663E-05 | global batch size:   512 | lm loss: 2.822873E+00 | loss scale: 262144.0 | grad norm: 27242.686 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-10-03 11:09:53,250] [INFO] [logging.py:68:log_dist] [Rank 0] step=86000, skipped=187, lr=[9.621720440377618e-05, 9.621720440377618e-05], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 86000 loss: 2.8114 iter time (s): 0.003 samples/sec: 167571.225
 iteration    86000/  152972 | consumed samples:     38952384 | elapsed time per iteration (ms): 6095.5 | learning rate: 9.622E-05 | global batch size:   512 | lm loss: 2.825115E+00 | loss scale: 262144.0 | grad norm: 27694.929 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 86000 | lm loss value: 2.776459E+00 | lm loss PPL: 1.606205E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    86200/  152972 | consumed samples:     39054784 | elapsed time per iteration (ms): 6942.0 | learning rate: 9.580E-05 | global batch size:   512 | lm loss: 2.822316E+00 | loss scale: 524288.0 | grad norm: 50208.410 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    86400/  152972 | consumed samples:     39157184 | elapsed time per iteration (ms): 6096.6 | learning rate: 9.539E-05 | global batch size:   512 | lm loss: 2.823204E+00 | loss scale: 524288.0 | grad norm: 48510.298 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    86600/  152972 | consumed samples:     39259584 | elapsed time per iteration (ms): 6095.4 | learning rate: 9.497E-05 | global batch size:   512 | lm loss: 2.819707E+00 | loss scale: 524288.0 | grad norm: 48894.740 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    86800/  152972 | consumed samples:     39361984 | elapsed time per iteration (ms): 6082.7 | learning rate: 9.456E-05 | global batch size:   512 | lm loss: 2.824777E+00 | loss scale: 1048576.0 | grad norm: 99075.587 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    87000/  152972 | consumed samples:     39464384 | elapsed time per iteration (ms): 6091.9 | learning rate: 9.415E-05 | global batch size:   512 | lm loss: 2.825242E+00 | loss scale: 262144.0 | grad norm: 25096.235 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 87000 | lm loss value: 2.765862E+00 | lm loss PPL: 1.589273E+01 | 
-------------------------------------------------------------------------------------------------
saving checkpoint at iteration   87000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-03 12:57:06,344] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step87000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   87000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1584.04
 iteration    87200/  152972 | consumed samples:     39566784 | elapsed time per iteration (ms): 6950.1 | learning rate: 9.373E-05 | global batch size:   512 | lm loss: 2.822462E+00 | loss scale: 262144.0 | grad norm: 26305.608 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    87400/  152972 | consumed samples:     39669184 | elapsed time per iteration (ms): 6096.0 | learning rate: 9.331E-05 | global batch size:   512 | lm loss: 2.821524E+00 | loss scale: 524288.0 | grad norm: 53502.986 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    87600/  152972 | consumed samples:     39771584 | elapsed time per iteration (ms): 6108.5 | learning rate: 9.290E-05 | global batch size:   512 | lm loss: 2.823395E+00 | loss scale: 524288.0 | grad norm: 53236.914 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    87800/  152972 | consumed samples:     39873984 | elapsed time per iteration (ms): 6099.9 | learning rate: 9.248E-05 | global batch size:   512 | lm loss: 2.823653E+00 | loss scale: 524288.0 | grad norm: 51058.212 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-10-03 14:38:44,865] [INFO] [logging.py:68:log_dist] [Rank 0] step=88000, skipped=193, lr=[9.207430144316244e-05, 9.207430144316244e-05], mom=[(0.9, 0.999), (0.9, 0.999)]
 iteration    88000/  152972 | consumed samples:     39976384 | elapsed time per iteration (ms): 6094.8 | learning rate: 9.207E-05 | global batch size:   512 | lm loss: 2.817372E+00 | loss scale: 524288.0 | grad norm: 49237.747 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
steps: 88000 loss: 2.8307 iter time (s): 0.003 samples/sec: 167734.341
-------------------------------------------------------------------------------------------------
 validation loss at iteration 88000 | lm loss value: 2.769093E+00 | lm loss PPL: 1.594417E+01 | 
-------------------------------------------------------------------------------------------------
saving checkpoint at iteration   88018 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-03 14:43:26,021] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step88018/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   88018 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1505.18
[exiting program after 1190.093535220623 minutes] datetime: 2021-10-03 14:43:27 
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
2021-10-03 14:44:07.912475: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:07.912472: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:07.912473: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:07.912637: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.442981: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.442979: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.442979: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.442990: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.468797: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.468792: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.468793: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.468793: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.562604: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.562610: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.562604: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.562606: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.572633: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.572631: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.572632: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.572635: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.647905: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.647913: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.647914: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.647917: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.663926: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.663926: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.663929: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.663933: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.683847: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.683860: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.683865: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.683855: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.705963: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.705958: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.705960: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.705969: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.716982: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.716982: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.716980: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.716987: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.750674: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.750678: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.750679: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.750684: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.873852: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.873866: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.873860: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.873862: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.915491: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.915495: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.955610: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.955608: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.955611: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.955607: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.963244: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.963238: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.963238: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:08.963247: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:09.097869: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:09.097874: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:09.097879: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:09.097886: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:09.881967: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-03 14:44:09.915931: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

DeepSpeed C++/CUDA extension op report--------------------------------------------------

--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------JIT compiled ops requires ninja


DeepSpeed C++/CUDA extension op report--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninja
--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

ninjaninjaninja ..................  ninja ....................................[92m[OKAY][0m   
[92m[OKAY][0m..................
[92m[OKAY][0m-------------------------------------------------- 
--------------------------------------------------[92m[OKAY][0m

--------------------------------------------------op name
op name 
 ................op name................ --------------------------------------------------  installed
 ................installed..op name   ..compatibleinstalled 
  compatible--------------------------------------------------
..
................--------------------------------------------------  
compatibleinstalled
 --------------------------------------------------..
 cpu_adamcompatible cpu_adam
............... cpu_adam ...............--------------------------------------------------[92m[YES][0m  
 ...............[92m[YES][0m......   ......[92m[YES][0m [92m[OKAY][0m [92m[OKAY][0m

...... cpu_adam[92m[OKAY][0m
 ............... [92m[YES][0m fused_adam......fused_adam  ............. ............. fused_adam[92m[OKAY][0m [93m[NO][0m [93m[NO][0m 
....................   .......[92m[OKAY][0m[93m[NO][0m 
 [92m[OKAY][0m.......
fused_lamb  fused_adam[92m[OKAY][0mfused_lamb.............   
..........................[93m[NO][0m   [93m[NO][0mfused_lamb....... [93m[NO][0m  ....... ............. [92m[OKAY][0m.......[92m[OKAY][0m 

 [93m[NO][0m [92m[OKAY][0m.......
 [92m[OKAY][0m
fused_lamb sparse_attn.............sparse_attn  ............  ............[93m[NO][0m [93m[NO][0msparse_attn[93m[NO][0m    .................................    [92m[OKAY][0m[92m[OKAY][0m
[93m[NO][0m
[92m[OKAY][0m .......
transformer  transformer[92m[OKAY][0m............ 
 ............[93m[NO][0m  [93m[NO][0m....... transformer.......   [92m[OKAY][0m............[92m[OKAY][0msparse_attn
 
[93m[NO][0m  ...................stochastic_transformer stochastic_transformer  [92m[OKAY][0m [93m[NO][0m..
  [93m[NO][0m[93m[NO][0m   stochastic_transformer.....................    .[92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0m[93m[NO][0m .......
 [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------

DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

ninjaninja  ....................................  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

op nameop name  ................................  installedinstalled  ....  compatiblecompatible

----------------------------------------------------------------------------------------------------

cpu_adam ............... [92m[YES][0mcpu_adam  .....................  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m .......fused_adam  [92m[OKAY][0m
............. [93m[NO][0mfused_lamb  ....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0mfused_lamb
 ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0msparse_attn  ....... [92m[OKAY][0m
............ [93m[NO][0m stochastic_transformer.......  .[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------

op name
op name ................op nameop name    ................installed................................    ..installedinstalledinstalled    compatible....
..  -------------------------------------------------- compatiblecompatiblecompatible


----------------------------------------------------------------------------------------------------
--------------------------------------------------

cpu_adamcpu_adam cpu_adam...............  cpu_adam ...............[92m[YES][0m ...............  ............... [92m[YES][0m......   [92m[YES][0m[92m[OKAY][0m......[92m[YES][0m 
  ......[92m[OKAY][0m ......
[92m[OKAY][0m 
[92m[OKAY][0m
fused_adam ............. [93m[NO][0m fused_adamfused_adam....... fused_adam  ............. .............[92m[OKAY][0m ............. 
[93m[NO][0m  [93m[NO][0mfused_lamb.......[93m[NO][0m    ....................[92m[OKAY][0m 
....... [93m[NO][0m [92m[OKAY][0m [92m[OKAY][0m.......

fused_lamb [92m[OKAY][0m 
fused_lamb.............fused_lamb   [93m[NO][0m..........................   .......[93m[NO][0m[93m[NO][0m   [92m[OKAY][0m..............
  [92m[OKAY][0msparse_attn[92m[OKAY][0m
 
............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attntransformer  ............sparse_attn............  sparse_attn[93m[NO][0m [93m[NO][0m  ............ ....... ....... ............[93m[NO][0m[92m[OKAY][0m  
[93m[NO][0m [92m[OKAY][0m .......
.......transformer   [92m[OKAY][0m............[92m[OKAY][0mstochastic_transformer
 
[93m[NO][0m  .transformertransformer.......    [93m[NO][0m........................ [92m[OKAY][0m ....... 
[93m[NO][0m [93m[NO][0m [92m[OKAY][0m .......
stochastic_transformer.......   [92m[OKAY][0m.
[92m[OKAY][0m 
[93m[NO][0m .......stochastic_transformer stochastic_transformer [92m[OKAY][0m 
..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
ninjaninjaninjaninja   .................. .................................... ..................  [92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m

[92m[OKAY][0m

------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------


op nameop nameop nameop name    ................................................ ................  installed installedinstalled installed ..  .. ....  compatible compatible
compatible
compatible
--------------------------------------------------
--------------------------------------------------
----------------------------------------------------------------------------------------------------


cpu_adam ...............cpu_adamcpu_adam  cpu_adam [92m[YES][0m............... ............... ...............  ......[92m[YES][0m  [92m[YES][0m ......[92m[YES][0m [92m[OKAY][0m  
......[92m[OKAY][0m 
[92m[OKAY][0m......
 [92m[OKAY][0m
fused_adam ............. fused_adam[93m[NO][0m  ....................fused_adam  fused_adam [92m[OKAY][0m[93m[NO][0m  ..........................
.......   [93m[NO][0m[93m[NO][0m[92m[OKAY][0mfused_lamb  
 ...........................   fused_lamb[92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m 
 
....................  [92m[OKAY][0m[93m[NO][0mfused_lambfused_lamb
   .................................   [92m[OKAY][0m[93m[NO][0m
 [93m[NO][0m.......  ....... sparse_attn[92m[OKAY][0m[92m[OKAY][0m 

............ [93m[NO][0msparse_attn  ....... ............[92m[OKAY][0m 
[93m[NO][0m .......sparse_attn  [92m[OKAY][0mtransformer............
  ............ transformersparse_attn[93m[NO][0m[93m[NO][0m  ............  .......  ...................[93m[NO][0m [92m[OKAY][0m  [92m[OKAY][0m
.......[93m[NO][0m
 [92m[OKAY][0mtransformer
 stochastic_transformer ....... ............ stochastic_transformer. [92m[OKAY][0m  
[93m[NO][0m.[93m[NO][0m   .......[93m[NO][0m.......   [92m[OKAY][0mtransformer.......[92m[OKAY][0m

  ............[92m[OKAY][0m
stochastic_transformer  .[93m[NO][0m  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninjaninjaninja   ......................................................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


----------------------------------------------------------------------------------------------------
--------------------------------------------------
op nameop name
  ninjaop name................................    ................installed..................installed   .. installed[92m[OKAY][0m  ..
compatible..
 --------------------------------------------------compatible --------------------------------------------------compatible


op name-------------------------------------------------- --------------------------------------------------
................
 installed cpu_adam..  ...............cpu_adamcpu_adamcompatible   
[92m[YES][0m..............................--------------------------------------------------   ......[92m[YES][0m
[92m[YES][0m  [92m[OKAY][0m...... 
 ......[92m[OKAY][0m 
[92m[OKAY][0m
cpu_adam ............... fused_adam[92m[YES][0m  .............fused_adam......  fused_adam............. [93m[NO][0m   [92m[OKAY][0m[93m[NO][0m.................... 
  .......[93m[NO][0m[92m[OKAY][0m  
[92m[OKAY][0m.......
 fused_adam[92m[OKAY][0mfused_lambfused_lamb 
  .......................................   fused_lamb[93m[NO][0m[93m[NO][0m [93m[NO][0m   ........................... .......  [92m[OKAY][0m[93m[NO][0m
 [92m[OKAY][0m [92m[OKAY][0m
.......
 [92m[OKAY][0mfused_lamb
 ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0msparse_attn  ...................  [92m[OKAY][0m[93m[NO][0msparse_attn
  ................... sparse_attntransformer  [92m[OKAY][0m [93m[NO][0m
........................   .......[93m[NO][0mtransformer[93m[NO][0m    .......[92m[OKAY][0m............ .......
  [92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m
transformer 
 .......stochastic_transformer............transformer   [92m[OKAY][0m .
[93m[NO][0m............  [93m[NO][0m stochastic_transformer ..............   [93m[NO][0m[92m[OKAY][0m.[92m[OKAY][0m
  
[93m[NO][0m....... stochastic_transformer ....... [92m[OKAY][0m .[92m[OKAY][0m
 
[93m[NO][0m .......stochastic_transformer  [92m[OKAY][0m
. [93m[NO][0m ....... [92m[OKAY][0m
----------------------------------------------------------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------


DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja   .................................... ..................   ..................[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m 


[92m[OKAY][0m----------------------------------------------------------------------------------------------------

--------------------------------------------------
op name
--------------------------------------------------op name 
................ op name op name................ installed  ................installed ................  installedinstalled .. ..  .. compatible..compatible
  
compatible--------------------------------------------------

compatible----------------------------------------------------------------------------------------------------


--------------------------------------------------
cpu_adam ............... cpu_adamcpu_adam[92m[YES][0mcpu_adam    ...................................................    [92m[OKAY][0m[92m[YES][0m[92m[YES][0m
 [92m[YES][0m ...... ...... ...... [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0mfused_adam

 ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam .............fused_adamfused_lambfused_adam   [93m[NO][0m .......................... ............. .......  [93m[NO][0m [93m[NO][0m [93m[NO][0m[92m[OKAY][0m  .......
..............   [92m[OKAY][0m[92m[OKAY][0mfused_lamb
[92m[OKAY][0m
 
............. [93m[NO][0mfused_lamb fused_lamb ....... ............. [92m[OKAY][0m............. sparse_attn
 [93m[NO][0m ............[93m[NO][0m  ....... [93m[NO][0m .......[92m[OKAY][0m  .......[92m[OKAY][0m
sparse_attn 
[92m[OKAY][0m 
............ [93m[NO][0mtransformer  ...................  [92m[OKAY][0m[93m[NO][0m
 .......sparse_attn transformer [92m[OKAY][0m sparse_attn............
............   [93m[NO][0m............stochastic_transformer[93m[NO][0m    .......[93m[NO][0m .[92m[OKAY][0m....... 
  .......[92m[OKAY][0m[93m[NO][0m 
 stochastic_transformer[92m[OKAY][0m.......  
[92m[OKAY][0m.transformer
 transformer [93m[NO][0m  ...............................   [93m[NO][0m[93m[NO][0m[92m[OKAY][0m  
..............  [92m[OKAY][0m[92m[OKAY][0m

stochastic_transformerstochastic_transformer  ..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op report--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report


----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja--------------------------------------------------


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja


JIT compiled ops requires ninja
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------DeepSpeed C++/CUDA extension op report

JIT compiled ops requires ninja
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninja
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
op name
op nameop name  ................op name ................  installed................ ................installed   installed .. installed ....compatible 
 .. compatible-------------------------------------------------- compatible

compatible
--------------------------------------------------
--------------------------------------------------

--------------------------------------------------cpu_adam
 ............... [92m[YES][0m ...... cpu_adam[92m[OKAY][0mcpu_adam cpu_adam
 ............... ............... ............... [92m[YES][0m [92m[YES][0m [92m[YES][0m ...... ...... ......[92m[OKAY][0mfused_adam  
 [92m[OKAY][0m[92m[OKAY][0m.............

 [93m[NO][0m ....... [92m[OKAY][0m
fused_adam fused_lambfused_adam.............  fused_adam............. .............   [93m[NO][0m.............[93m[NO][0m [93m[NO][0m   .............. .......[93m[NO][0m [92m[OKAY][0m [92m[OKAY][0m.......
 
 [92m[OKAY][0mfused_lambfused_lamb[92m[OKAY][0m
  
..........................  [93m[NO][0mfused_lamb [93m[NO][0m.......   sparse_attn.............[92m[OKAY][0m.......   
[93m[NO][0m[92m[OKAY][0m ............
 .......[93m[NO][0m  [92m[OKAY][0m....... 
[92m[OKAY][0m
sparse_attn ............transformer  [93m[NO][0msparse_attn............   .......[93m[NO][0m............  sparse_attn.......  [93m[NO][0m[92m[OKAY][0m ............ 
[92m[OKAY][0m .......
[93m[NO][0m  transformer[92m[OKAY][0m....... stochastic_transformer  
............[92m[OKAY][0m .
transformer[93m[NO][0m  [93m[NO][0m  .......transformer ...................[92m[OKAY][0m   ............
[93m[NO][0m[92m[OKAY][0m  
stochastic_transformer[93m[NO][0m.......   ........ [92m[OKAY][0m [93m[NO][0m
[92m[OKAY][0m 
....... stochastic_transformer[92m[OKAY][0m
 stochastic_transformer.  [93m[NO][0m.  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m

--------------------------------------------------
----------------------------------------------------------------------------------------------------


--------------------------------------------------op nameop nameop name
   op name................................................   installed installed................ installed ..  .. installed.. compatible 
 compatible..--------------------------------------------------
 compatible
--------------------------------------------------compatible


--------------------------------------------------
--------------------------------------------------
cpu_adam ...............cpu_adam  cpu_adam[92m[YES][0m...............cpu_adam  [92m[YES][0m  ...... ....................................   [92m[OKAY][0m [92m[OKAY][0m[92m[YES][0m

[92m[YES][0m  ...... ......[92m[OKAY][0m 
[92m[OKAY][0m
fused_adamfused_adam  ..........................  fused_adamfused_adam[93m[NO][0m [93m[NO][0m  ............. ...........................    [93m[NO][0m[92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m
  
..............fused_lamb   [92m[OKAY][0mfused_lamb.............[92m[OKAY][0m  

.............[93m[NO][0m  [93m[NO][0mfused_lamb....... fused_lamb .......  [92m[OKAY][0m ..........................
[92m[OKAY][0m  
[93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

sparse_attnsparse_attn  ........................  sparse_attn[93m[NO][0msparse_attn[93m[NO][0m    .......................... ............  [92m[OKAY][0m [92m[OKAY][0m
[93m[NO][0m
[93m[NO][0m  transformer.............. transformer   ............[92m[OKAY][0m............ [92m[OKAY][0m 
[93m[NO][0m
[93m[NO][0m  .......transformer transformer.......  [92m[OKAY][0m ............
............[92m[OKAY][0m  
[93m[NO][0m[93m[NO][0m  stochastic_transformer....... stochastic_transformer........    [92m[OKAY][0m[92m[OKAY][0m.[93m[NO][0m

  [93m[NO][0m.......  stochastic_transformer.......[92m[OKAY][0m stochastic_transformer
[92m[OKAY][0m  
..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


op nameop name op name ................op name  ................ installed ................................ installed  .. installedinstalled .. compatible  .... 
 compatible--------------------------------------------------compatible


compatible--------------------------------------------------
--------------------------------------------------

--------------------------------------------------
cpu_adam ............... [92m[YES][0m cpu_adam......cpu_adam   ...............[92m[OKAY][0mcpu_adam 
............... [92m[YES][0m  ...............[92m[YES][0m......   [92m[YES][0mfused_adam......[92m[OKAY][0m 
  ......[92m[OKAY][0m............. 
 [92m[OKAY][0m[93m[NO][0m
 fused_adam.......  .............[92m[OKAY][0m 
[93m[NO][0m .......fused_adam fused_lamb[92m[OKAY][0m  .............fused_adam
 .............  [93m[NO][0m.............[93m[NO][0m  fused_lamb .............. [93m[NO][0m   .............[92m[OKAY][0m[92m[OKAY][0m
....... 
 [93m[NO][0m[92m[OKAY][0m .......
fused_lamb  [92m[OKAY][0mfused_lamb.............
  sparse_attn.............[93m[NO][0m   ............[93m[NO][0m.......   [93m[NO][0m....... [92m[OKAY][0m.......  [92m[OKAY][0m[92m[OKAY][0msparse_attn


 ............transformer [93m[NO][0m  ...................  [93m[NO][0m[92m[OKAY][0m 
....... sparse_attn[92m[OKAY][0mtransformer
  sparse_attn........................  [93m[NO][0mstochastic_transformer............   [93m[NO][0m . ....... [93m[NO][0m....... [93m[NO][0m   [92m[OKAY][0m[92m[OKAY][0m..............

  [92m[OKAY][0m[92m[OKAY][0m

stochastic_transformertransformer  transformer.............   ............[93m[NO][0m[93m[NO][0m   [93m[NO][0m..............   .......[92m[OKAY][0m[92m[OKAY][0m
 
[92m[OKAY][0m
stochastic_transformer .stochastic_transformer  [93m[NO][0m.  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninja

JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja   .................. .................................... .................. [92m[OKAY][0m  [92m[OKAY][0m

[92m[OKAY][0m
--------------------------------------------------[92m[OKAY][0m
----------------------------------------------------------------------------------------------------


op name--------------------------------------------------op name op name  
................................................   installedop nameinstalled installed  .. .................. ..  compatible installedcompatible
compatible
-------------------------------------------------- 
--------------------------------------------------
..
-------------------------------------------------- 
compatible
--------------------------------------------------
cpu_adamcpu_adam  ..............................  cpu_adam[92m[YES][0m[92m[YES][0mcpu_adam    ..........................................   [92m[OKAY][0m [92m[OKAY][0m[92m[YES][0m

 [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
fused_adamfused_adam  ..........................  [93m[NO][0m[93m[NO][0mfused_adamfused_adam    ........................................    [92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m
[93m[NO][0m
  fused_lamb.............. fused_lamb .............   [92m[OKAY][0m[92m[OKAY][0m.............[93m[NO][0m

  [93m[NO][0m.......fused_lambfused_lamb    [92m[OKAY][0m.................................
   [92m[OKAY][0m[93m[NO][0m[93m[NO][0m
  ..............  [92m[OKAY][0m[92m[OKAY][0m

sparse_attn ............ [93m[NO][0m ....... sparse_attn[92m[OKAY][0m sparse_attn
............sparse_attn   transformer[93m[NO][0m........................    ...................  [93m[NO][0m[92m[OKAY][0m[93m[NO][0m[93m[NO][0m 
  .............. ....... transformer[92m[OKAY][0m  [92m[OKAY][0m[92m[OKAY][0m
............
 
[93m[NO][0mtransformer stochastic_transformer.......   transformer............. [92m[OKAY][0m  [93m[NO][0m[93m[NO][0m
............   .......[93m[NO][0m.......stochastic_transformer    [92m[OKAY][0m[92m[OKAY][0m........
 
 [93m[NO][0m [92m[OKAY][0m.......stochastic_transformer
  [92m[OKAY][0m.
stochastic_transformer  [93m[NO][0m ........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------
JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------

JIT compiled ops requires ninja--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op report
--------------------------------------------------

--------------------------------------------------DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja


--------------------------------------------------

DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja   .................. .................................... ..................  [92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


op nameop nameop nameop name   ................ ................................................    installedinstalled installedinstalled..   compatible ...... 
 -------------------------------------------------- 
compatiblecompatiblecompatible


------------------------------------------------------------------------------------------------------------------------------------------------------

cpu_adam
 ............... [92m[YES][0m ...... cpu_adamcpu_adamcpu_adam [92m[OKAY][0m  
.............................................   [92m[YES][0m[92m[YES][0m[92m[YES][0m   ............fused_adam......   .............[92m[OKAY][0m [92m[OKAY][0m
 
[92m[OKAY][0m[93m[NO][0m 
....... [92m[OKAY][0m
fused_adamfused_lambfused_adam   .......................................fused_adam   [93m[NO][0m [93m[NO][0m.............  [93m[NO][0m ....... ....... [93m[NO][0m....... [92m[OKAY][0m  [92m[OKAY][0m
.......[92m[OKAY][0m

 [92m[OKAY][0m
fused_lambfused_lamb fused_lamb ............. .............  sparse_attn [93m[NO][0m.............[93m[NO][0m ............ ..............    [93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m
 
 ..............  [92m[OKAY][0m[92m[OKAY][0m

sparse_attntransformersparse_attn   ....................................  sparse_attn[93m[NO][0m[93m[NO][0m   [93m[NO][0m ..........................    .......[92m[OKAY][0m[92m[OKAY][0m
[93m[NO][0m 
 [92m[OKAY][0m.......
transformer stochastic_transformer transformer[92m[OKAY][0m............   
............[93m[NO][0m.  [93m[NO][0m  transformer.......[93m[NO][0m  ....... ................... [92m[OKAY][0m[92m[OKAY][0m
  
[93m[NO][0m[92m[OKAY][0mstochastic_transformer 
 stochastic_transformer....... ..   [92m[OKAY][0m[93m[NO][0m[93m[NO][0m
  ..............stochastic_transformer   [92m[OKAY][0m[92m[OKAY][0m.

 [93m[NO][0m ....... [92m[OKAY][0m
ninjaninjaninja   ninja......................................................    ..................[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m 


[92m[OKAY][0m--------------------------------------------------
----------------------------------------------------------------------------------------------------
--------------------------------------------------

op name
op name op nameop name ................   ................................................installed    installed..installedinstalled    ....compatible..   compatiblecompatible
compatible
--------------------------------------------------

--------------------------------------------------
----------------------------------------------------------------------------------------------------


cpu_adam ............... [92m[YES][0mcpu_adamcpu_adam cpu_adam ......  ............... ............... [92m[YES][0m...............   [92m[OKAY][0m[92m[YES][0m......
[92m[YES][0m   ......[92m[OKAY][0m...... 
 [92m[OKAY][0m[92m[OKAY][0mfused_adam

 ............. [93m[NO][0m ....... [92m[OKAY][0mfused_adam
 .............fused_adam  [93m[NO][0m.............fused_lambfused_adam    .......[93m[NO][0m..........................   [92m[OKAY][0m .......[93m[NO][0m[93m[NO][0m
   [92m[OKAY][0m..............
 fused_lamb [92m[OKAY][0m [92m[OKAY][0mfused_lamb
.............
  .............[93m[NO][0m  fused_lamb[93m[NO][0m.......   ....................[92m[OKAY][0m 
sparse_attn [92m[OKAY][0m
 [93m[NO][0m ...................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
sparse_attn transformer............  sparse_attn............[93m[NO][0m   ............sparse_attn.......[93m[NO][0m  [93m[NO][0m  [92m[OKAY][0m...................
  [93m[NO][0m [92m[OKAY][0m .......transformer....... 
  ............[92m[OKAY][0m[92m[OKAY][0m stochastic_transformer
[93m[NO][0m
  transformer........   transformer[92m[OKAY][0m[93m[NO][0m ............
  ...................[93m[NO][0m   stochastic_transformer[93m[NO][0m  [92m[OKAY][0m...............
  [93m[NO][0m [92m[OKAY][0m[92m[OKAY][0m 

....... [92m[OKAY][0m
stochastic_transformerstochastic_transformer  . .[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report--------------------------------------------------

----------------------------------------------------------------------------------------------------

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m

[92m[OKAY][0m
------------------------------------------------------------------------------------------------------------------------------------------------------


op name--------------------------------------------------op name 
op name................   op nameinstalled................................   installed ..................installed    compatible..
installed.. -------------------------------------------------- 
 compatible
..compatible-------------------------------------------------- 

compatible--------------------------------------------------
cpu_adam
-------------------------------------------------- ...............
 cpu_adam[92m[YES][0m  cpu_adam.....................  ...............[92m[OKAY][0m cpu_adam 
[92m[YES][0m [92m[YES][0m ............... ............   fused_adam[92m[OKAY][0m[92m[YES][0m[92m[OKAY][0m  

...................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
fused_adamfused_adam  fused_lamb.......................... fused_adam  ............. [93m[NO][0m[93m[NO][0m .............  [93m[NO][0m....... .......  [93m[NO][0m[92m[OKAY][0m ....... 
[92m[OKAY][0m 
[92m[OKAY][0m.......fused_lamb
 fused_lamb [92m[OKAY][0m............. 
 .............[93m[NO][0m  fused_lamb.......[93m[NO][0m   [92m[OKAY][0msparse_attn....... .............
 ............ [92m[OKAY][0m [93m[NO][0m
[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m
sparse_attn
 ............transformer  [93m[NO][0msparse_attn............   ...................[93m[NO][0m   [92m[OKAY][0msparse_attn.......
 [93m[NO][0m ............ [92m[OKAY][0mtransformer....... 
  [93m[NO][0m............[92m[OKAY][0m  stochastic_transformer
.......[93m[NO][0m   .transformer[92m[OKAY][0m....... 
  ............[93m[NO][0m[92m[OKAY][0m  
transformer[93m[NO][0m.......   stochastic_transformer............[92m[OKAY][0m  
........[93m[NO][0m   [93m[NO][0m[92m[OKAY][0m....... 
.......  [92m[OKAY][0m[92m[OKAY][0mstochastic_transformer

 . stochastic_transformer [93m[NO][0m.  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
----------------------------------------------------------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report


DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
JIT compiled ops requires ninja

--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
async_io ...............-------------------------------------------------- 
[93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


------------------------------------------------------------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja


JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------

op nameop nameop nameop name  ................  ................ ................................   installedinstalledinstalledinstalled   .. .... ..  compatiblecompatible compatiblecompatible


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


cpu_adamcpu_adamcpu_adam cpu_adam  ............... .............................. ...............  [92m[YES][0m [92m[YES][0m[92m[YES][0m[92m[YES][0m    ........................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m

[92m[OKAY][0m

fused_adamfused_adam  fused_adam.............fused_adam.............   [93m[NO][0m [93m[NO][0m............. .............  ....... [93m[NO][0m [93m[NO][0m....... [92m[OKAY][0m  
..............[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0mfused_lamb

 fused_lamb.............  fused_lamb[93m[NO][0m............. fused_lamb .............  ....... [93m[NO][0m............. [93m[NO][0m  [92m[OKAY][0m.......  [93m[NO][0m
.......[92m[OKAY][0m  
.......[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attn ............sparse_attnsparse_attn sparse_attn[93m[NO][0m    ...........................................   [93m[NO][0m [93m[NO][0m [93m[NO][0m[92m[OKAY][0m 
....... ....... ....... [92m[OKAY][0m[92m[OKAY][0mtransformer 

 [92m[OKAY][0m............
 transformer[93m[NO][0mtransformer  transformer ................... ............  ............ [93m[NO][0m[92m[OKAY][0m [93m[NO][0m 
[93m[NO][0m ....... ....... stochastic_transformer[92m[OKAY][0m.......  
 [92m[OKAY][0m.[92m[OKAY][0m
 stochastic_transformer
[93m[NO][0m  ........  stochastic_transformer[93m[NO][0mstochastic_transformer [92m[OKAY][0m  
.........   [92m[OKAY][0m[93m[NO][0m[93m[NO][0m
  ..............  [92m[OKAY][0m[92m[OKAY][0m

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninjaninja   ..................  ......................................................[92m[OKAY][0m   [92m[OKAY][0m
[92m[OKAY][0m[92m[OKAY][0m--------------------------------------------------


op name------------------------------------------------------------------------------------------------------------------------------------------------------ 


................op nameop nameop name    installed................................ ................ installed   ..installed..  installed compatible.. 
compatible..  
--------------------------------------------------compatiblecompatible--------------------------------------------------


--------------------------------------------------
--------------------------------------------------
cpu_adam cpu_adam............... cpu_adam...............   [92m[YES][0m[92m[YES][0m............... cpu_adam  ...... ......[92m[YES][0m...............    [92m[OKAY][0m......[92m[YES][0m
[92m[OKAY][0m  
......[92m[OKAY][0m 
[92m[OKAY][0m
fused_adam ............. fused_adam[93m[NO][0m fused_adam ............. ....... fused_adam  .............[93m[NO][0m[92m[OKAY][0m.............  .......
[93m[NO][0m   [93m[NO][0mfused_lamb[92m[OKAY][0m ....... 
.......  .............[92m[OKAY][0m[92m[OKAY][0m
fused_lamb
  [93m[NO][0mfused_lamb............. fused_lamb   .......[93m[NO][0m..........................   ....... [92m[OKAY][0m[93m[NO][0m
 [93m[NO][0m  [92m[OKAY][0m..............
  [92m[OKAY][0m[92m[OKAY][0m

sparse_attn ............ [93m[NO][0msparse_attn  ...................  sparse_attn[93m[NO][0m[92m[OKAY][0m  ...................
  sparse_attn[93m[NO][0m[92m[OKAY][0mtransformer 
 ................... transformer ............   [93m[NO][0m[93m[NO][0m[92m[OKAY][0m ............ 
..............   [93m[NO][0m[92m[OKAY][0m
[92m[OKAY][0mtransformer 
....... stochastic_transformer ............ transformer  [92m[OKAY][0m[93m[NO][0m
.............  stochastic_transformer ....... [93m[NO][0m[93m[NO][0m  ........ [92m[OKAY][0m .......
[93m[NO][0m   [92m[OKAY][0m.......
[92m[OKAY][0mstochastic_transformer 
 [92m[OKAY][0m.
 stochastic_transformer[93m[NO][0m  ........  [93m[NO][0m [92m[OKAY][0m.......
 [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m[NO][0m ....... 
[93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m utils.......  ..................[93m[NO][0m 
[92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. transformer_inference[93m[NO][0m  .........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report
--------------------------------------------------

----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
JIT compiled ops requires ninja--------------------------------------------------


JIT compiled ops requires ninja--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------

--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
op nameop name
op name   op name................ ................................  installed installed................ installed   ......   compatiblecompatibleinstalledcompatible


 ----------------------------------------------------------------------------------------------------
..
-------------------------------------------------- 
compatible
--------------------------------------------------cpu_adam
cpu_adam  ...............cpu_adam...............  [92m[YES][0m  [92m[YES][0m.....................   cpu_adam......[92m[OKAY][0m[92m[YES][0m  
[92m[OKAY][0m ...............
......  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0mfused_adam
fused_adam  ..........................  [93m[NO][0mfused_adam[93m[NO][0m   ...........................  [92m[OKAY][0m[92m[OKAY][0m 
fused_adam
[93m[NO][0m  fused_lamb....................fused_lamb    .............[92m[OKAY][0m[93m[NO][0m............. 
 [93m[NO][0m [93m[NO][0m fused_lamb....... ....... .......   .............[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m 


[93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0msparse_attn sparse_attn  ...............................sparse_attn    [93m[NO][0m[93m[NO][0m............ [92m[OKAY][0m  ..............
 [93m[NO][0m[92m[OKAY][0m  
[92m[OKAY][0m.......
 [92m[OKAY][0mtransformer
transformer  ............transformer............   [93m[NO][0m............[93m[NO][0m  [93m[NO][0m....... sparse_attn  .......[92m[OKAY][0m.......  
 [92m[OKAY][0m............[92m[OKAY][0m

 stochastic_transformerstochastic_transformer [93m[NO][0m .stochastic_transformer  . .......[93m[NO][0m . [93m[NO][0m   ....... .......[92m[OKAY][0m [93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m


 ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m .......async_io  [93m[NO][0m...............
 [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0mtransformer_inference  .........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer quantizer..............  .............. [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utils .................. utils[92m[YES][0m  ........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0mquantizer
 .............. [93m[NO][0m quantizer.......  ..............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m--------------------------------------------------

--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer .............. [93m[NO][0mquantizer  .....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m async_io.......  ...............[93m[NO][0m 
[93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0mtransformer_inference  .........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
utils .................. [92m[YES][0mutils  ........................  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0mquantizer  .....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
----------------------------------------------------------------------------------------------------

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.async_io
 ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inferenceasync_io  .................  [93m[NO][0m[93m[NO][0m  ....... .......[92m[OKAY][0m 
[93m[NO][0m
utils .................. [92m[YES][0m ......transformer_inference  [92m[OKAY][0m..
 [93m[NO][0m .......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m [93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`........[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. 
utils[92m[OKAY][0m 

.................. [92m[YES][0m ......-------------------------------------------------- 
[92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0masync_io
 ............... async_io[93m[NO][0m -------------------------------------------------- .......
...............  [93m[NO][0m[93m[NO][0m
 ....... [93m[NO][0m
transformer_inference .. [93m[NO][0mtransformer_inference .......  ..[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
utils .................. utils[92m[YES][0m  ........................  [92m[YES][0m[92m[OKAY][0m 
......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m ....... quantizer[92m[OKAY][0m 
.............. [93m[NO][0m --------------------------------------------------.......
 [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0masync_io .......  ...............[93m[NO][0m
 [93m[NO][0m ....... [93m[NO][0m
transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utils .................. [92m[YES][0m utils......  ..................[92m[OKAY][0m 
[92m[YES][0m ......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m ....... [92m[OKAY][0mquantizer
 .............. [93m[NO][0m --------------------------------------------------.......
 [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m [92m[OKAY][0m

----------------------------------------------------------------------------------------------------

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m async_io......  ...............[92m[OKAY][0m 
[93m[NO][0m ....... [93m[NO][0mquantizer
 .............. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference --------------------------------------------------..
 [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`. ...... [92m[OKAY][0m

quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inferenceasync_io ..  ...............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[93m[NO][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0mtransformer_inference
 .. [93m[NO][0m quantizer.......  ..............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m --------------------------------------------------......
 [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utils utils..................  [92m[YES][0m..................  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utils .................. utils[92m[YES][0m  ........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0mquantizer
 .............. [93m[NO][0mquantizer  .....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... async_io[93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... utils[92m[OKAY][0m 
.................. [92m[YES][0m ...... [92m[OKAY][0m
quantizerutils  ................................  [93m[NO][0m[92m[YES][0m  .............  [92m[OKAY][0m
[92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path ............... 
torch install path ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']...............
 ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version ....................torch version  1.8.1....................
 1.8.1torch cuda version
 ...............torch cuda version  11.1...............
 nvcc version11.1 
nvcc version ..................... .....................11.2 
11.2
deepspeed install path deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed info
 ...................deepspeed wheel compiled w.  0.4.2+72ce55a, 72ce55a, big-science......
 deepspeed wheel compiled w.torch 1.8, cuda 11.1 
...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version DeepSpeed general environment info:.................... 
1.8.1
torch cuda versiontorch install path  ..............................  11.1
nvcc version ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'].....................
 11.2torch version
 ....................deepspeed install path  1.8.1...........
 torch cuda version['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ...............
 11.1deepspeed info
 nvcc version................... .....................  0.4.2+72ce55a, 72ce55a, big-science11.2

deepspeed install path deepspeed wheel compiled w............  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']......
 deepspeed infotorch 1.8, cuda 11.1 
................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch version torch version....................  ....................1.8.1
 1.8.1torch cuda version 
............... 11.1
torch cuda versionnvcc version  ....................................  11.211.1

deepspeed install path nvcc version...........  .....................['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
deepspeed info11.2 
................... deepspeed install path0.4.2+72ce55a, 72ce55a, big-science 
...........deepspeed wheel compiled w.  ...... torch 1.8, cuda 11.1['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m .......async_io [93m[NO][0m 
............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... transformer_inference[92m[OKAY][0m 
.. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m utils......  ..................[92m[OKAY][0m 
[92m[YES][0m ......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m quantizer.......  ..............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m--------------------------------------------------

--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch version torch version....................  ....................1.8.1 
1.8.1
torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ...................0.4.2+72ce55a, 72ce55a, big-science 
0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0mtransformer_inference 
.. [93m[NO][0m ....... [92m[OKAY][0m
utils ..................transformer_inference [92m[YES][0m  ........  [92m[OKAY][0m[93m[NO][0m
 .......quantizer  [92m[OKAY][0m.............. 
[93m[NO][0m ....... [92m[OKAY][0m
utils-------------------------------------------------- 
.................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch install pathtorch version  ...................................  1.8.1
torch cuda version ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']11.1

nvcc version torch version..................... ....................  11.21.8.1

deepspeed install pathtorch cuda version  ..........................  11.1['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

nvcc versiondeepspeed info  ........................................  11.20.4.2+72ce55a, 72ce55a, big-science

deepspeed install pathdeepspeed wheel compiled w.  .................  torch 1.8, cuda 11.1['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path ...............
 torch install path ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']...............
 torch version .................... 1.8.1
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch cuda version ...............torch version  11.1....................
 nvcc version1.8.1 
..................... 11.2torch cuda version
 ...............deepspeed install path  11.1...........
 nvcc version['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
..................... deepspeed info11.2 
...................deepspeed install path  0.4.2+72ce55a, 72ce55a, big-science...........
 deepspeed wheel compiled w. ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']...... 
torch 1.8, cuda 11.1deepspeed info
 ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path torch install path...............  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ............... ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ...................0.4.2+72ce55a, 72ce55a, big-science 
0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version ....................torch version  1.8.1....................
 1.8.1
torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 ...........deepspeed install path  ...........['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
................... deepspeed info0.4.2+72ce55a, 72ce55a, big-science 
...................deepspeed wheel compiled w.  0.4.2+72ce55a, 72ce55a, big-science......
 torch 1.8, cuda 11.1deepspeed wheel compiled w.
 ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ............... ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ...................0.4.2+72ce55a, 72ce55a, big-science 
0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2
deepspeed install path deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 ...........deepspeed install path  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ...................0.4.2+72ce55a, 72ce55a, big-science 
0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version ....................['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 
1.8.1
torch cuda versiontorch version  ...................................  11.1
1.8.1
nvcc version .....................torch cuda version  11.2...............
 deepspeed install path11.1 
...........nvcc version  .....................['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
11.2deepspeed info
 deepspeed install path...................  ...........0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:
DeepSpeed general environment info:torch install path
 ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] ....................
 1.8.1
torch version torch cuda version....................  1.8.1...............
 11.1torch cuda version
 nvcc version...............  .....................11.1 
11.2nvcc version
 deepspeed install path.....................  ...........11.2 
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed install path
 ...........deepspeed info  ...................['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed info
 deepspeed wheel compiled w....................  ......0.4.2+72ce55a, 72ce55a, big-science 
torch 1.8, cuda 11.1deepspeed wheel compiled w.
 ...... torch 1.8, cuda 11.1
using world size: 64, data-parallel-size: 4, tensor-model-parallel size: 4, pipeline-model-parallel size: 4 
using torch.float16 for parameters ...
------------------------ arguments ------------------------
  accumulate_allreduce_grads_in_fp32 .............. False
  adam_beta1 ...................................... 0.9
  adam_beta2 ...................................... 0.999
  adam_eps ........................................ 1e-08
  adlr_autoresume ................................. False
  adlr_autoresume_interval ........................ 1000
  apply_query_key_layer_scaling ................... True
  apply_residual_connection_post_layernorm ........ False
  attention_dropout ............................... 0.1
  attention_softmax_in_fp32 ....................... False
  bert_binary_head ................................ True
  bert_load ....................................... None
  bf16 ............................................ False
  bias_dropout_fusion ............................. True
  bias_gelu_fusion ................................ True
  biencoder_projection_dim ........................ 0
  biencoder_shared_query_context_model ............ False
  block_data_path ................................. None
  checkpoint_activations .......................... True
  checkpoint_in_cpu ............................... False
  checkpoint_num_layers ........................... 1
  clip_grad ....................................... 1.0
  codecarbon_dir .................................. None
  consumed_train_samples .......................... 0
  consumed_valid_samples .......................... 0
  contigious_checkpointing ........................ False
  cpu_optimizer ................................... False
  cpu_torch_adam .................................. False
  data_impl ....................................... mmap
  data_parallel_size .............................. 4
  data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document']
  dataloader_type ................................. single
  DDP_impl ........................................ local
  decoder_seq_length .............................. None
  deepscale ....................................... False
  deepscale_config ................................ None
  deepspeed ....................................... True
  deepspeed_activation_checkpointing .............. True
  deepspeed_config ................................ ./ds_config.1376383.json
  deepspeed_mpi ................................... False
  distribute_checkpointed_activations ............. False
  distributed_backend ............................. nccl
  embedding_path .................................. None
  encoder_seq_length .............................. 2048
  eod_mask_loss ................................... False
  eval_interval ................................... 1000
  eval_iters ...................................... 100
  evidence_data_path .............................. None
  exit_duration_in_mins ........................... 1190
  exit_interval ................................... None
  ffn_hidden_size ................................. 8192
  finetune ........................................ False
  fp16 ............................................ True
  fp16_lm_cross_entropy ........................... False
  fp32_residual_connection ........................ False
  gigaflos_no_embeds .............................. 0
  global_batch_size ............................... 512
  glu_activation .................................. None
  hidden_dropout .................................. 0.1
  hidden_size ..................................... 2048
  hysteresis ...................................... 2
  ict_head_size ................................... None
  ict_load ........................................ None
  img_dim ......................................... 224
  indexer_batch_size .............................. 128
  indexer_log_interval ............................ 1000
  init_method_std ................................. 0.02
  init_method_xavier_uniform ...................... False
  initial_loss_scale .............................. 4294967296
  kv_channels ..................................... 128
  layernorm_epsilon ............................... 1e-05
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
  lazy_mpu_init ................................... None
  load ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
  local_rank ...................................... 0
  log_batch_size_to_tensorboard ................... True
  log_interval .................................... 200
  log_learning_rate_to_tensorboard ................ True
  log_loss_scale_to_tensorboard ................... True
  log_num_zeros_in_grad ........................... False
  log_params_norm ................................. False
  log_timers_to_tensorboard ....................... True
  log_validation_ppl_to_tensorboard ............... True
  loss_on_targets_only ............................ False
  loss_scale ...................................... 12.0
  loss_scale_window ............................... 1000
  lr .............................................. 0.0002
  lr_decay_iters .................................. None
  lr_decay_samples ................................ 73242187
  lr_decay_style .................................. cosine
  lr_warmup_fraction .............................. None
  lr_warmup_iters ................................. 0
  lr_warmup_samples ............................... 183105
  make_vocab_size_divisible_by .................... 128
  mask_prob ....................................... 0.15
  masked_softmax_fusion ........................... True
  max_position_embeddings ......................... None
  memory_centric_tiled_linear ..................... False
  merge_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-merges.txt
  micro_batch_size ................................ 8
  min_loss_scale .................................. 1.0
  min_lr .......................................... 1e-05
  mmap_warmup ..................................... False
  no_load_optim ................................... None
  no_load_rng ..................................... None
  no_save_optim ................................... None
  no_save_rng ..................................... None
  no_train ........................................ None
  num_attention_heads ............................. 16
  num_channels .................................... 3
  num_classes ..................................... 1000
  num_layers ...................................... 24
  num_layers_per_virtual_pipeline_stage ........... None
  num_workers ..................................... 2
  onnx_safe ....................................... None
  openai_gelu ..................................... False
  optimizer ....................................... adam
  override_lr_scheduler ........................... False
  params_dtype .................................... torch.float16
  partition_activations ........................... False
  patch_dim ....................................... 16
  pipeline_model_parallel_size .................... 4
  position_embedding_type ......................... PositionEmbeddingType.rotary
  profile_backward ................................ False
  query_in_block_prob ............................. 0.1
  rampup_batch_size ............................... ['32', '32', '2_000_000']
  rank ............................................ 0
  remote_device ................................... none
  reset_attention_mask ............................ False
  reset_position_ids .............................. False
  retriever_report_topk_accuracies ................ []
  retriever_score_scaling ......................... False
  retriever_seq_length ............................ 256
  sample_rate ..................................... 1.0
  save ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
  save_interval ................................... 1500
  scatter_gather_tensors_in_pipeline .............. True
  scattered_embeddings ............................ False
  seed ............................................ 1234
  seq_length ...................................... 2048
  sgd_momentum .................................... 0.9
  short_seq_prob .................................. 0.1
  split ........................................... 949,50,1
  split_transformers .............................. False
  synchronize_each_layer .......................... False
  tensor_model_parallel_size ...................... 4
  tensorboard_dir ................................. /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/tr4c-1B3-rotary-oscar-logs/tensorboard
  tensorboard_log_interval ........................ 1
  tensorboard_queue_size .......................... 5
  tile_factor ..................................... 1
  titles_data_path ................................ None
  tokenizer_name_or_path .......................... None
  tokenizer_type .................................. GPT2BPETokenizer
  train_iters ..................................... None
  train_samples ................................... 73242187
  use_checkpoint_lr_scheduler ..................... False
  use_contiguous_buffers_in_ddp ................... False
  use_cpu_initialization .......................... None
  use_one_sent_docs ............................... False
  use_pin_memory .................................. False
  virtual_pipeline_model_parallel_size ............ None
  vocab_extra_ids ................................. 0
  vocab_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-vocab.json
  weight_decay .................................... 0.1
  world_size ...................................... 64
  zero_allgather_bucket_size ...................... 0.0
  zero_contigious_gradients ....................... False
  zero_reduce_bucket_size ......................... 0.0
  zero_reduce_scatter ............................. False
  zero_stage ...................................... 1
-------------------- end of arguments ---------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

will use batch size rampup starting from global batch size 32 to global batch size 512 with batch size increments 32 over 2000000 samples.
> building GPT2BPETokenizer tokenizer ...
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']torch version
 .................... 1.8.1torch version
 ....................torch cuda version  ...............1.8.1 
11.1
nvcc versiontorch cuda version .....................  ...............11.2 
11.1deepspeed install path 
........... nvcc version ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'].....................
 deepspeed info 11.2...................
 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed install path
 deepspeed wheel compiled w. .................  torch 1.8, cuda 11.1
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
 > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688)
> initializing torch distributed ...
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 
.................... 1.8.1torch version
 .................... torch cuda version1.8.1 
............... 11.1torch cuda version
 nvcc version...............  .....................11.1 
11.2nvcc version
 deepspeed install path.....................  11.2...........
 deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1

deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
> setting tensorboard ...
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  .............. [93m[NO][0m 
[93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m [92m[OKAY][0m

quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
> initializing tensor model parallel with size 4
> initializing pipeline model parallel with size 4
> setting random seeds to 1234 ...
[2021-10-03 14:44:24,594] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
> compiling dataset index builder ...
make: Entering directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data'
make: Nothing to be done for 'default'.
make: Leaving directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data'
>>> done with dataset index builder. Compilation time: 0.322 seconds
> compiling and loading fused kernels ...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_upper_triang_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_upper_triang_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module fused_mix_prec_layer_norm_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_mix_prec_layer_norm_cuda...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
>>> done with compiling and loading fused kernels. Compilation time: 26.448 seconds
time to initialize megatron (seconds): 67.433
[after megatron is initialized] datetime: 2021-10-03 14:44:51 
building GPT model ...
[2021-10-03 14:44:51,549] [INFO] [utils.py:680:see_memory_usage] Before Building Model
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved
  warnings.warn(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved
  warnings.warn(
[2021-10-03 14:44:51,551] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB         Max_MA 0.0 GB         CA 0.0 GB         Max_CA 0 GB 
[2021-10-03 14:44:51,552] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory:  used = 38.17 GB, percent = 20.4%
SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None
Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=1, data=0, model=0): 16, ProcessCoord(pipe=1, data=0, model=1): 17, ProcessCoord(pipe=1, data=0, model=2): 18, ProcessCoord(pipe=1, data=0, model=3): 19, ProcessCoord(pipe=1, data=1, model=0): 20, ProcessCoord(pipe=1, data=1, model=1): 21, ProcessCoord(pipe=1, data=1, model=2): 22, ProcessCoord(pipe=1, data=1, model=3): 23, ProcessCoord(pipe=1, data=2, model=0): 24, ProcessCoord(pipe=1, data=2, model=1): 25, ProcessCoord(pipe=1, data=2, model=2): 26, ProcessCoord(pipe=1, data=2, model=3): 27, ProcessCoord(pipe=1, data=3, model=0): 28, ProcessCoord(pipe=1, data=3, model=1): 29, ProcessCoord(pipe=1, data=3, model=2): 30, ProcessCoord(pipe=1, data=3, model=3): 31, ProcessCoord(pipe=2, data=0, model=0): 32, ProcessCoord(pipe=2, data=0, model=1): 33, ProcessCoord(pipe=2, data=0, model=2): 34, ProcessCoord(pipe=2, data=0, model=3): 35, ProcessCoord(pipe=2, data=1, model=0): 36, ProcessCoord(pipe=2, data=1, model=1): 37, ProcessCoord(pipe=2, data=1, model=2): 38, ProcessCoord(pipe=2, data=1, model=3): 39, ProcessCoord(pipe=2, data=2, model=0): 40, ProcessCoord(pipe=2, data=2, model=1): 41, ProcessCoord(pipe=2, data=2, model=2): 42, ProcessCoord(pipe=2, data=2, model=3): 43, ProcessCoord(pipe=2, data=3, model=0): 44, ProcessCoord(pipe=2, data=3, model=1): 45, ProcessCoord(pipe=2, data=3, model=2): 46, ProcessCoord(pipe=2, data=3, model=3): 47, ProcessCoord(pipe=3, data=0, model=0): 48, ProcessCoord(pipe=3, data=0, model=1): 49, ProcessCoord(pipe=3, data=0, model=2): 50, ProcessCoord(pipe=3, data=0, model=3): 51, ProcessCoord(pipe=3, data=1, model=0): 52, ProcessCoord(pipe=3, data=1, model=1): 53, ProcessCoord(pipe=3, data=1, model=2): 54, ProcessCoord(pipe=3, data=1, model=3): 55, ProcessCoord(pipe=3, data=2, model=0): 56, ProcessCoord(pipe=3, data=2, model=1): 57, ProcessCoord(pipe=3, data=2, model=2): 58, ProcessCoord(pipe=3, data=2, model=3): 59, ProcessCoord(pipe=3, data=3, model=0): 60, ProcessCoord(pipe=3, data=3, model=1): 61, ProcessCoord(pipe=3, data=3, model=2): 62, ProcessCoord(pipe=3, data=3, model=3): 63}
[2021-10-03 14:44:52,075] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer
stage=0 layers=9
     0: _to_float16
     1: EmbeddingPipe
     2: <lambda>
     3: ParallelTransformerLayerPipe
     4: ParallelTransformerLayerPipe
     5: ParallelTransformerLayerPipe
     6: ParallelTransformerLayerPipe
     7: ParallelTransformerLayerPipe
     8: ParallelTransformerLayerPipe
stage=1 layers=6
     9: ParallelTransformerLayerPipe
    10: ParallelTransformerLayerPipe
    11: ParallelTransformerLayerPipe
    12: ParallelTransformerLayerPipe
    13: ParallelTransformerLayerPipe
    14: ParallelTransformerLayerPipe
stage=2 layers=6
    15: ParallelTransformerLayerPipe
    16: ParallelTransformerLayerPipe
    17: ParallelTransformerLayerPipe
    18: ParallelTransformerLayerPipe
    19: ParallelTransformerLayerPipe
    20: ParallelTransformerLayerPipe
stage=3 layers=10
    21: ParallelTransformerLayerPipe
    22: ParallelTransformerLayerPipe
    23: ParallelTransformerLayerPipe
    24: ParallelTransformerLayerPipe
    25: ParallelTransformerLayerPipe
    26: ParallelTransformerLayerPipe
    27: <lambda>
    28: MixedFusedLayerNorm
    29: EmbeddingPipe
    30: float16_to_fp32
  loss: CrossEntropy
 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 101544960
 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 101544960
 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 101549056
 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 101549056 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 101549056

 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 101549056
 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 101544960
[2021-10-03 14:44:52,447] [INFO] [utils.py:680:see_memory_usage] After Building Model
[2021-10-03 14:44:52,447] [INFO] [utils.py:681:see_memory_usage] MA 0.2 GB         Max_MA 0.22 GB         CA 0.24 GB         Max_CA 0 GB 
[2021-10-03 14:44:52,448] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory:  used = 38.56 GB, percent = 20.6%
 > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 101544960
setting training iterations to 152972
> learning rate decay style: cosine
DeepSpeed is enabled.
[2021-10-03 14:44:52,467] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+72ce55a, git-hash=72ce55a, git-branch=big-science
[2021-10-03 14:44:52,537] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False
[2021-10-03 14:44:52,537] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer
[2021-10-03 14:44:52,537] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer
[2021-10-03 14:44:52,537] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam
[2021-10-03 14:44:52,537] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type=<class 'apex.optimizers.fused_adam.FusedAdam'>
[2021-10-03 14:44:52,538] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer
[2021-10-03 14:44:52,538] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000
[2021-10-03 14:44:52,538] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000
[2021-10-03 14:44:52,538] [INFO] [stage2.py:108:__init__] CPU Offload: False
[2021-10-03 14:44:52,538] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False
[2021-10-03 14:44:52,772] [INFO] [stage2.py:419:__init__] optimizer state initialized
[2021-10-03 14:44:52,773] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam
[2021-10-03 14:44:52,773] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler
[2021-10-03 14:44:52,773] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = <megatron.learning_rates.AnnealingLR object at 0x14b66d5deb20>
[2021-10-03 14:44:52,773] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)]
[2021-10-03 14:44:52,773] [INFO] [config.py:900:print] DeepSpeedEngine configuration:
[2021-10-03 14:44:52,773] [INFO] [config.py:904:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2021-10-03 14:44:52,773] [INFO] [config.py:904:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2021-10-03 14:44:52,773] [INFO] [config.py:904:print]   allreduce_always_fp32 ........ False
[2021-10-03 14:44:52,773] [INFO] [config.py:904:print]   amp_enabled .................. False
[2021-10-03 14:44:52,773] [INFO] [config.py:904:print]   amp_params ................... False
[2021-10-03 14:44:52,773] [INFO] [config.py:904:print]   checkpoint_tag_validation_enabled  True
[2021-10-03 14:44:52,773] [INFO] [config.py:904:print]   checkpoint_tag_validation_fail  False
[2021-10-03 14:44:52,773] [INFO] [config.py:904:print]   disable_allgather ............ False
[2021-10-03 14:44:52,773] [INFO] [config.py:904:print]   dump_state ................... False
[2021-10-03 14:44:52,774] [INFO] [config.py:904:print]   dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1}
[2021-10-03 14:44:52,774] [INFO] [config.py:904:print]   eigenvalue_enabled ........... False
[2021-10-03 14:44:52,774] [INFO] [config.py:904:print]   eigenvalue_gas_boundary_resolution  1
[2021-10-03 14:44:52,774] [INFO] [config.py:904:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2021-10-03 14:44:52,774] [INFO] [config.py:904:print]   eigenvalue_layer_num ......... 0
[2021-10-03 14:44:52,774] [INFO] [config.py:904:print]   eigenvalue_max_iter .......... 100
[2021-10-03 14:44:52,774] [INFO] [config.py:904:print]   eigenvalue_stability ......... 1e-06
[2021-10-03 14:44:52,774] [INFO] [config.py:904:print]   eigenvalue_tol ............... 0.01
[2021-10-03 14:44:52,774] [INFO] [config.py:904:print]   eigenvalue_verbose ........... False
[2021-10-03 14:44:52,774] [INFO] [config.py:904:print]   elasticity_enabled ........... False
[2021-10-03 14:44:52,774] [INFO] [config.py:904:print]   flops_profiler_config ........ {
    "enabled": false, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2021-10-03 14:44:52,774] [INFO] [config.py:904:print]   fp16_enabled ................. True
[2021-10-03 14:44:52,774] [INFO] [config.py:904:print]   fp16_mixed_quantize .......... False
[2021-10-03 14:44:52,774] [INFO] [config.py:904:print]   global_rank .................. 0
[2021-10-03 14:44:52,774] [INFO] [config.py:904:print]   gradient_accumulation_steps .. 16
[2021-10-03 14:44:52,774] [INFO] [config.py:904:print]   gradient_clipping ............ 1.0
[2021-10-03 14:44:52,774] [INFO] [config.py:904:print]   gradient_predivide_factor .... 1.0
[2021-10-03 14:44:52,774] [INFO] [config.py:904:print]   initial_dynamic_scale ........ 4096
[2021-10-03 14:44:52,774] [INFO] [config.py:904:print]   loss_scale ................... 0
[2021-10-03 14:44:52,774] [INFO] [config.py:904:print]   memory_breakdown ............. False
[2021-10-03 14:44:52,774] [INFO] [config.py:904:print]   optimizer_legacy_fusion ...... False
[2021-10-03 14:44:52,774] [INFO] [config.py:904:print]   optimizer_name ............... None
[2021-10-03 14:44:52,774] [INFO] [config.py:904:print]   optimizer_params ............. None
[2021-10-03 14:44:52,774] [INFO] [config.py:904:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2021-10-03 14:44:52,774] [INFO] [config.py:904:print]   pld_enabled .................. False
[2021-10-03 14:44:52,774] [INFO] [config.py:904:print]   pld_params ................... False
[2021-10-03 14:44:52,774] [INFO] [config.py:904:print]   prescale_gradients ........... False
[2021-10-03 14:44:52,774] [INFO] [config.py:904:print]   quantize_change_rate ......... 0.001
[2021-10-03 14:44:52,774] [INFO] [config.py:904:print]   quantize_groups .............. 1
[2021-10-03 14:44:52,774] [INFO] [config.py:904:print]   quantize_offset .............. 1000
[2021-10-03 14:44:52,774] [INFO] [config.py:904:print]   quantize_period .............. 1000
[2021-10-03 14:44:52,774] [INFO] [config.py:904:print]   quantize_rounding ............ 0
[2021-10-03 14:44:52,774] [INFO] [config.py:904:print]   quantize_start_bits .......... 16
[2021-10-03 14:44:52,774] [INFO] [config.py:904:print]   quantize_target_bits ......... 8
[2021-10-03 14:44:52,775] [INFO] [config.py:904:print]   quantize_training_enabled .... False
[2021-10-03 14:44:52,775] [INFO] [config.py:904:print]   quantize_type ................ 0
[2021-10-03 14:44:52,775] [INFO] [config.py:904:print]   quantize_verbose ............. False
[2021-10-03 14:44:52,775] [INFO] [config.py:904:print]   scheduler_name ............... None
[2021-10-03 14:44:52,775] [INFO] [config.py:904:print]   scheduler_params ............. None
[2021-10-03 14:44:52,775] [INFO] [config.py:904:print]   sparse_attention ............. None
[2021-10-03 14:44:52,775] [INFO] [config.py:904:print]   sparse_gradients_enabled ..... False
[2021-10-03 14:44:52,775] [INFO] [config.py:904:print]   steps_per_print .............. 2000
[2021-10-03 14:44:52,775] [INFO] [config.py:904:print]   tensorboard_enabled .......... False
[2021-10-03 14:44:52,775] [INFO] [config.py:904:print]   tensorboard_job_name ......... DeepSpeedJobName
[2021-10-03 14:44:52,775] [INFO] [config.py:904:print]   tensorboard_output_path ...... 
[2021-10-03 14:44:52,775] [INFO] [config.py:904:print]   train_batch_size ............. 512
[2021-10-03 14:44:52,775] [INFO] [config.py:904:print]   train_micro_batch_size_per_gpu  8
[2021-10-03 14:44:52,775] [INFO] [config.py:904:print]   use_quantizer_kernel ......... False
[2021-10-03 14:44:52,775] [INFO] [config.py:904:print]   wall_clock_breakdown ......... False
[2021-10-03 14:44:52,775] [INFO] [config.py:904:print]   world_size ................... 4
[2021-10-03 14:44:52,775] [INFO] [config.py:904:print]   zero_allow_untested_optimizer  False
[2021-10-03 14:44:52,775] [INFO] [config.py:904:print]   zero_config .................. {
    "stage": 1, 
    "contiguous_gradients": false, 
    "reduce_scatter": true, 
    "reduce_bucket_size": 5.000000e+08, 
    "allgather_partitions": true, 
    "allgather_bucket_size": 5.000000e+08, 
    "overlap_comm": false, 
    "load_from_fp32_weights": true, 
    "elastic_checkpoint": true, 
    "offload_param": null, 
    "offload_optimizer": null, 
    "sub_group_size": 1.000000e+09, 
    "prefetch_bucket_size": 5.000000e+07, 
    "param_persistence_threshold": 1.000000e+05, 
    "max_live_parameters": 1.000000e+09, 
    "max_reuse_distance": 1.000000e+09, 
    "gather_fp16_weights_on_model_save": false, 
    "ignore_unused_parameters": true, 
    "round_robin_gradients": false, 
    "legacy_stage1": false
}
[2021-10-03 14:44:52,775] [INFO] [config.py:904:print]   zero_enabled ................. True
[2021-10-03 14:44:52,775] [INFO] [config.py:904:print]   zero_optimization_stage ...... 1
[2021-10-03 14:44:52,775] [INFO] [config.py:906:print]   json = {
    "train_micro_batch_size_per_gpu": 8, 
    "train_batch_size": 512, 
    "gradient_clipping": 1.0, 
    "zero_optimization": {
        "stage": 1
    }, 
    "fp16": {
        "enabled": true, 
        "loss_scale": 0, 
        "loss_scale_window": 500, 
        "hysteresis": 2, 
        "min_loss_scale": 1, 
        "initial_scale_power": 12
    }, 
    "steps_per_print": 2.000000e+03, 
    "wall_clock_breakdown": false
}
[2021-10-03 14:44:52,775] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=16 micro_batch_size=8
[2021-10-03 14:44:53,066] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-03 14:44:53,066] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-03 14:44:53,066] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-03 14:44:53,066] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-03 14:44:53,066] [INFO] [engine.py:134:__init__] RANK=32 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-03 14:44:53,066] [INFO] [engine.py:134:__init__] RANK=34 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-03 14:44:53,066] [INFO] [engine.py:134:__init__] RANK=33 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-03 14:44:53,066] [INFO] [engine.py:134:__init__] RANK=18 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-03 14:44:53,066] [INFO] [engine.py:134:__init__] RANK=16 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-03 14:44:53,066] [INFO] [engine.py:134:__init__] RANK=17 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-03 14:44:53,066] [INFO] [engine.py:134:__init__] RANK=19 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-03 14:44:53,066] [INFO] [engine.py:134:__init__] RANK=51 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-03 14:44:53,066] [INFO] [engine.py:134:__init__] RANK=50 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-03 14:44:53,066] [INFO] [engine.py:134:__init__] RANK=49 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-03 14:44:53,066] [INFO] [engine.py:134:__init__] RANK=48 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-03 14:44:53,066] [INFO] [engine.py:134:__init__] RANK=35 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
 > using checkpoint value 0.0002 for learning rate
 > using checkpoint value 1e-05 for minimum learning rate
 > using checkpoint value 183105 for warmup iterations
 > using checkpoint value 73242187 for total number of iterations
 > using checkpoint value cosine for decay style
successfully loaded 4 ZeRO state_dicts for rank 26
successfully loaded 4 ZeRO state_dicts for rank 28
successfully loaded 4 ZeRO state_dicts for rank 47
successfully loaded 4 ZeRO state_dicts for rank 22
successfully loaded 4 ZeRO state_dicts for rank 39
successfully loaded 4 ZeRO state_dicts for rank 18
successfully loaded 4 ZeRO state_dicts for rank 20
successfully loaded 4 ZeRO state_dicts for rank 36
successfully loaded 4 ZeRO state_dicts for rank 25
successfully loaded 4 ZeRO state_dicts for rank 30
successfully loaded 4 ZeRO state_dicts for rank 21
successfully loaded 4 ZeRO state_dicts for rank 16
successfully loaded 4 ZeRO state_dicts for rank 29
successfully loaded 4 ZeRO state_dicts for rank 46
successfully loaded 4 ZeRO state_dicts for rank 42
successfully loaded 4 ZeRO state_dicts for rank 38
successfully loaded 4 ZeRO state_dicts for rank 35
successfully loaded 4 ZeRO state_dicts for rank 23
successfully loaded 4 ZeRO state_dicts for rank 31
successfully loaded 4 ZeRO state_dicts for rank 34
successfully loaded 4 ZeRO state_dicts for rank 17
successfully loaded 4 ZeRO state_dicts for rank 27
successfully loaded 4 ZeRO state_dicts for rank 24
successfully loaded 4 ZeRO state_dicts for rank 37
successfully loaded 4 ZeRO state_dicts for rank 41
successfully loaded 4 ZeRO state_dicts for rank 19
successfully loaded 4 ZeRO state_dicts for rank 33
successfully loaded 4 ZeRO state_dicts for rank 43
successfully loaded 4 ZeRO state_dicts for rank 45
successfully loaded 4 ZeRO state_dicts for rank 63
successfully loaded 4 ZeRO state_dicts for rank 51
successfully loaded 4 ZeRO state_dicts for rank 10
successfully loaded 4 ZeRO state_dicts for rank 44
successfully loaded 4 ZeRO state_dicts for rank 40
successfully loaded 4 ZeRO state_dicts for rank 9
successfully loaded 4 ZeRO state_dicts for rank 32
successfully loaded 4 ZeRO state_dicts for rank 11
successfully loaded 4 ZeRO state_dicts for rank 59
successfully loaded 4 ZeRO state_dicts for rank 54
successfully loaded 4 ZeRO state_dicts for rank 7
successfully loaded 4 ZeRO state_dicts for rank 6
successfully loaded 4 ZeRO state_dicts for rank 52
successfully loaded 4 ZeRO state_dicts for rank 14
successfully loaded 4 ZeRO state_dicts for rank 56
successfully loaded 4 ZeRO state_dicts for rank 12
successfully loaded 4 ZeRO state_dicts for rank 0
successfully loaded 4 ZeRO state_dicts for rank 55
successfully loaded 4 ZeRO state_dicts for rank 53
successfully loaded 4 ZeRO state_dicts for rank 5
successfully loaded 4 ZeRO state_dicts for rank 61
successfully loaded 4 ZeRO state_dicts for rank 13
successfully loaded 4 ZeRO state_dicts for rank 2
successfully loaded 4 ZeRO state_dicts for rank 48
successfully loaded 4 ZeRO state_dicts for rank 15
successfully loaded 4 ZeRO state_dicts for rank 60
successfully loaded 4 ZeRO state_dicts for rank 50
successfully loaded 4 ZeRO state_dicts for rank 57
successfully loaded 4 ZeRO state_dicts for rank 62
successfully loaded 4 ZeRO state_dicts for rank 3
successfully loaded 4 ZeRO state_dicts for rank 58
successfully loaded 4 ZeRO state_dicts for rank 4
successfully loaded 4 ZeRO state_dicts for rank 1
loading 4 zero partition checkpoints for rank 26
successfully loaded 4 ZeRO state_dicts for rank 8
loading 4 zero partition checkpoints for rank 47
loading 4 zero partition checkpoints for rank 20
loading 4 zero partition checkpoints for rank 22
loading 4 zero partition checkpoints for rank 39
loading 4 zero partition checkpoints for rank 28
loading 4 zero partition checkpoints for rank 30
loading 4 zero partition checkpoints for rank 18
loading 4 zero partition checkpoints for rank 36
successfully loaded 4 ZeRO state_dicts for rank 49
loading 4 zero partition checkpoints for rank 25
loading 4 zero partition checkpoints for rank 16
loading 4 zero partition checkpoints for rank 21
loading 4 zero partition checkpoints for rank 29
loading 4 zero partition checkpoints for rank 42
loading 4 zero partition checkpoints for rank 38
loading 4 zero partition checkpoints for rank 46
loading 4 zero partition checkpoints for rank 35
loading 4 zero partition checkpoints for rank 24
loading 4 zero partition checkpoints for rank 34
loading 4 zero partition checkpoints for rank 37
loading 4 zero partition checkpoints for rank 23
loading 4 zero partition checkpoints for rank 31
loading 4 zero partition checkpoints for rank 17
loading 4 zero partition checkpoints for rank 27
loading 4 zero partition checkpoints for rank 41
loading 4 zero partition checkpoints for rank 45
loading 4 zero partition checkpoints for rank 19
loading 4 zero partition checkpoints for rank 33
loading 4 zero partition checkpoints for rank 43
loading 4 zero partition checkpoints for rank 44
loading 4 zero partition checkpoints for rank 40
loading 4 zero partition checkpoints for rank 32
loading 4 zero partition checkpoints for rank 14
loading 4 zero partition checkpoints for rank 12
loading 4 zero partition checkpoints for rank 0
 checkpoint version 3.0
loading 4 zero partition checkpoints for rank 51
loading 4 zero partition checkpoints for rank 63
loading 4 zero partition checkpoints for rank 48
loading 4 zero partition checkpoints for rank 9
loading 4 zero partition checkpoints for rank 10
loading 4 zero partition checkpoints for rank 11
loading 4 zero partition checkpoints for rank 54loading 4 zero partition checkpoints for rank 52

loading 4 zero partition checkpoints for rank 55
loading 4 zero partition checkpoints for rank 59
loading 4 zero partition checkpoints for rank 6
loading 4 zero partition checkpoints for rank 7
loading 4 zero partition checkpoints for rank 56
loading 4 zero partition checkpoints for rank 53
loading 4 zero partition checkpoints for rank 15
loading 4 zero partition checkpoints for rank 60
loading 4 zero partition checkpoints for rank 5
loading 4 zero partition checkpoints for rank 13
loading 4 zero partition checkpoints for rank 61
loading 4 zero partition checkpoints for rank 2
loading 4 zero partition checkpoints for rank 58
loading 4 zero partition checkpoints for rank 4
loading 4 zero partition checkpoints for rank 50
loading 4 zero partition checkpoints for rank 57
loading 4 zero partition checkpoints for rank 62
loading 4 zero partition checkpoints for rank 8
loading 4 zero partition checkpoints for rank 1
loading 4 zero partition checkpoints for rank 3
loading 4 zero partition checkpoints for rank 49
  successfully loaded checkpoint from /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints at iteration 88018
time (ms) | load-checkpoint: 1998.30
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.62471936estimated model parameters: 1.62471936

estimated model parameters: 1.209483264
estimated model parameters: 1.62471936estimated model parameters: 1.62471936

estimated model parameters: 1.62471936
estimated model parameters: 1.62471936estimated model parameters: 1.62471936
estimated model parameters: 1.209483264estimated model parameters: 1.209483264estimated model parameters: 1.209483264


estimated model parameters: 1.209483264estimated model parameters: 1.209483264estimated model parameters: 1.209483264


estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.624784896
estimated model parameters: 1.62471936
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.624784896estimated model parameters: 1.624784896

estimated model parameters: 1.62471936estimated model parameters: 1.62471936
estimated model parameters: 1.624784896
estimated model parameters: 1.209483264
estimated model parameters: 1.62471936
estimated model parameters: 1.62471936

estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.2095488
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.2095488
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.2095488
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.2095488
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")

estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264estimated model parameters: 1.209483264
estimated model parameters: 1.62471936
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264

estimated model parameters: 1.624784896
estimated model parameters: 1.62471936
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.624784896
estimated model parameters: 1.62471936
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.624784896
estimated model parameters: 1.62471936
estimated model parameters: 1.624784896
estimated model parameters: 1.209483264
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488

estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488

estimated model parameters without embeddings: 1.2095488

estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488

estimated model parameters without embeddings: 1.2095488
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264estimated model parameters: 1.209483264

estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

[after model, optimizer, and learning rate scheduler are built] datetime: 2021-10-03 14:44:55 
> building train, validation, and test datasets ...
 > datasets target sizes (minimum size):
    train:      73242187
    validation: 7833600
    test:       51200
> building train, validation, and test datasets for GPT ...
 > building dataset index ...
    reading sizes...
    reading pointers...
    reading document index...
    creating numpy buffer of mmap...
    creating memory view of numpy buffer...
 > finished creating indexed dataset in 5.150890 seconds
    number of documents: 304230423
 > dataset split:
    train:
     document indices in [0, 288714672) total of 288714672 documents
    validation:
     document indices in [288714672, 303926193) total of 15211521 documents
    test:
     document indices in [303926193, 304230423) total of 304230 documents
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.137 seconds
    total number of samples: 131537224
    total number of epochs: 1
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.175 seconds
    total number of samples: 13854322
    total number of epochs: 2
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.074 seconds
    total number of samples: 137384
    total number of epochs: 1
> finished creating GPT datasets ...
[after dataloaders are built] datetime: 2021-10-03 14:45:06 
done with setup ...
training ...
time (ms) | model-and-optimizer-setup: 3736.83 | train/valid/test-data-iterators-setup: 10166.21
Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion

Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion

Number of parameters: 1.624784896 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billion
Number of parameters: 1.624784896 billion

Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billion

Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion

Number of parameters: 1.624784896 billion
Number of parameters: 1.62471936 billionNumber of parameters: 1.62471936 billion

Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billionNumber of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.62471936 billion

Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.62471936 billionNumber of parameters: 1.62471936 billion

Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.62471936 billionNumber of parameters: 1.62471936 billion

Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters without embeddings: 1.2095488 billionNumber of parameters without embeddings: 1.2095488 billion

Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion

Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billionNumber of parameters: 1.209483264 billion

Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion


Number of parameters: 1.62471936 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion

Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
[before the start of training step] datetime: 2021-10-03 14:45:06 
[2021-10-03 14:45:06,109] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information
[2021-10-03 14:45:06,110] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False
[2021-10-03 14:45:06,110] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 24 total layers
[2021-10-03 14:45:06,110] [INFO] [checkpointing.py:415:forward] ----Synchronization False
[2021-10-03 14:45:06,110] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False
[Rank 50] (after 88200 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7234.0 | max reserved: 7234.0
[Rank 48] (after 88200 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 6742.0 | max reserved: 6742.0
[Rank 17] (after 88200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4668.0 | max reserved: 4668.0
[Rank 33] (after 88200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4252.0 | max reserved: 4252.0
[Rank 1] (after 88200 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.95751953125 | reserved: 5478.0 | max reserved: 5478.0
[Rank 49] (after 88200 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7182.0 | max reserved: 7182.0
[Rank 18] (after 88200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4652.0 | max reserved: 4652.0
[Rank 34] (after 88200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4252.0 | max reserved: 4252.0
[Rank 2] (after 88200 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.95751953125 | reserved: 5366.0 | max reserved: 5366.0
[Rank 51] (after 88200 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 6614.0 | max reserved: 6614.0
 iteration    88200/  152972 | consumed samples:     40078784 | elapsed time per iteration (ms): 5983.6 | learning rate: 9.166E-05 | global batch size:   512 | lm loss: 2.808177E+00 | loss scale: 524288.0 | grad norm: 42870.143 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[Rank 35] (after 88200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4364.0 | max reserved: 4364.0
[Rank 19] (after 88200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4652.0 | max reserved: 4652.0
[Rank 3] (after 88200 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.95751953125 | reserved: 5574.0 | max reserved: 5574.0
[Rank 32] (after 88200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4252.0 | max reserved: 4252.0
[Rank 16] (after 88200 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4572.0 | max reserved: 4572.0
[Rank 0] (after 88200 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.95751953125 | reserved: 5366.0 | max reserved: 5366.0
 iteration    88400/  152972 | consumed samples:     40181184 | elapsed time per iteration (ms): 5917.8 | learning rate: 9.125E-05 | global batch size:   512 | lm loss: 2.807703E+00 | loss scale: 524288.0 | grad norm: 44928.293 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration   88500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-03 15:32:50,853] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step88500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   88500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1531.40
 iteration    88600/  152972 | consumed samples:     40283584 | elapsed time per iteration (ms): 5928.2 | learning rate: 9.083E-05 | global batch size:   512 | lm loss: 2.806894E+00 | loss scale: 1048576.0 | grad norm: 95015.908 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    88800/  152972 | consumed samples:     40385984 | elapsed time per iteration (ms): 5925.3 | learning rate: 9.042E-05 | global batch size:   512 | lm loss: 2.807213E+00 | loss scale: 524288.0 | grad norm: 47670.962 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    89000/  152972 | consumed samples:     40488384 | elapsed time per iteration (ms): 5926.2 | learning rate: 9.001E-05 | global batch size:   512 | lm loss: 2.809571E+00 | loss scale: 524288.0 | grad norm: 52891.431 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 89000 | lm loss value: 2.763009E+00 | lm loss PPL: 1.584745E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    89200/  152972 | consumed samples:     40590784 | elapsed time per iteration (ms): 6770.9 | learning rate: 8.960E-05 | global batch size:   512 | lm loss: 2.810171E+00 | loss scale: 1048576.0 | grad norm: 52886.389 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    89400/  152972 | consumed samples:     40693184 | elapsed time per iteration (ms): 5943.6 | learning rate: 8.919E-05 | global batch size:   512 | lm loss: 2.811609E+00 | loss scale: 524288.0 | grad norm: 50190.453 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    89600/  152972 | consumed samples:     40795584 | elapsed time per iteration (ms): 5951.2 | learning rate: 8.878E-05 | global batch size:   512 | lm loss: 2.814046E+00 | loss scale: 524288.0 | grad norm: 54980.557 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    89800/  152972 | consumed samples:     40897984 | elapsed time per iteration (ms): 5946.8 | learning rate: 8.836E-05 | global batch size:   512 | lm loss: 2.815339E+00 | loss scale: 524288.0 | grad norm: 49822.012 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-10-03 18:04:01,841] [INFO] [logging.py:68:log_dist] [Rank 0] step=90000, skipped=199, lr=[8.795630573517453e-05, 8.795630573517453e-05], mom=[(0.9, 0.999), (0.9, 0.999)]
 iteration    90000/  152972 | consumed samples:     41000384 | elapsed time per iteration (ms): 5923.9 | learning rate: 8.796E-05 | global batch size:   512 | lm loss: 2.812233E+00 | loss scale: 524288.0 | grad norm: 54569.797 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
steps: 90000 loss: 2.7652 iter time (s): 0.003 samples/sec: 173297.507
-------------------------------------------------------------------------------------------------
 validation loss at iteration 90000 | lm loss value: 2.760447E+00 | lm loss PPL: 1.580691E+01 | 
-------------------------------------------------------------------------------------------------
saving checkpoint at iteration   90000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-03 18:06:53,787] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step90000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   90000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1383.92
 iteration    90200/  152972 | consumed samples:     41102784 | elapsed time per iteration (ms): 6787.2 | learning rate: 8.755E-05 | global batch size:   512 | lm loss: 2.812646E+00 | loss scale: 262144.0 | grad norm: 26039.693 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    90400/  152972 | consumed samples:     41205184 | elapsed time per iteration (ms): 5942.6 | learning rate: 8.714E-05 | global batch size:   512 | lm loss: 2.813895E+00 | loss scale: 262144.0 | grad norm: 25433.442 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    90600/  152972 | consumed samples:     41307584 | elapsed time per iteration (ms): 5920.8 | learning rate: 8.673E-05 | global batch size:   512 | lm loss: 2.814933E+00 | loss scale: 262144.0 | grad norm: 26784.830 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    90800/  152972 | consumed samples:     41409984 | elapsed time per iteration (ms): 5934.2 | learning rate: 8.631E-05 | global batch size:   512 | lm loss: 2.814415E+00 | loss scale: 524288.0 | grad norm: 50547.640 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    91000/  152972 | consumed samples:     41512384 | elapsed time per iteration (ms): 5936.7 | learning rate: 8.591E-05 | global batch size:   512 | lm loss: 2.814566E+00 | loss scale: 524288.0 | grad norm: 52181.525 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 91000 | lm loss value: 2.762884E+00 | lm loss PPL: 1.584548E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    91200/  152972 | consumed samples:     41614784 | elapsed time per iteration (ms): 6845.7 | learning rate: 8.550E-05 | global batch size:   512 | lm loss: 2.811956E+00 | loss scale: 524288.0 | grad norm: 53224.252 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    91400/  152972 | consumed samples:     41717184 | elapsed time per iteration (ms): 5954.5 | learning rate: 8.509E-05 | global batch size:   512 | lm loss: 2.813011E+00 | loss scale: 262144.0 | grad norm: 27426.841 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration   91500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-03 20:38:22,312] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step91500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   91500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1489.06
 iteration    91600/  152972 | consumed samples:     41819584 | elapsed time per iteration (ms): 5954.2 | learning rate: 8.468E-05 | global batch size:   512 | lm loss: 2.810743E+00 | loss scale: 262144.0 | grad norm: 26306.770 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    91800/  152972 | consumed samples:     41921984 | elapsed time per iteration (ms): 5941.4 | learning rate: 8.428E-05 | global batch size:   512 | lm loss: 2.813168E+00 | loss scale: 262144.0 | grad norm: 25460.500 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-10-03 21:27:54,291] [INFO] [logging.py:68:log_dist] [Rank 0] step=92000, skipped=204, lr=[8.386911331302633e-05, 8.386911331302633e-05], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 92000 loss: 2.8185 iter time (s): 0.003 samples/sec: 171865.499
 iteration    92000/  152972 | consumed samples:     42024384 | elapsed time per iteration (ms): 5945.0 | learning rate: 8.387E-05 | global batch size:   512 | lm loss: 2.813474E+00 | loss scale: 524288.0 | grad norm: 53950.219 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 92000 | lm loss value: 2.760389E+00 | lm loss PPL: 1.580599E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    92200/  152972 | consumed samples:     42126784 | elapsed time per iteration (ms): 6823.4 | learning rate: 8.346E-05 | global batch size:   512 | lm loss: 2.811664E+00 | loss scale: 524288.0 | grad norm: 53543.428 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    92400/  152972 | consumed samples:     42229184 | elapsed time per iteration (ms): 5927.5 | learning rate: 8.306E-05 | global batch size:   512 | lm loss: 2.811328E+00 | loss scale: 262144.0 | grad norm: 25433.925 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    92600/  152972 | consumed samples:     42331584 | elapsed time per iteration (ms): 5939.1 | learning rate: 8.265E-05 | global batch size:   512 | lm loss: 2.813017E+00 | loss scale: 262144.0 | grad norm: 24566.541 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    92800/  152972 | consumed samples:     42433984 | elapsed time per iteration (ms): 5932.0 | learning rate: 8.225E-05 | global batch size:   512 | lm loss: 2.811905E+00 | loss scale: 131072.0 | grad norm: 13829.302 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    93000/  152972 | consumed samples:     42536384 | elapsed time per iteration (ms): 5939.4 | learning rate: 8.184E-05 | global batch size:   512 | lm loss: 2.811901E+00 | loss scale: 131072.0 | grad norm: 13141.205 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 93000 | lm loss value: 2.758544E+00 | lm loss PPL: 1.577685E+01 | 
-------------------------------------------------------------------------------------------------
saving checkpoint at iteration   93000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-03 23:12:40,824] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step93000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   93000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1540.09
 iteration    93200/  152972 | consumed samples:     42638784 | elapsed time per iteration (ms): 6806.5 | learning rate: 8.143E-05 | global batch size:   512 | lm loss: 2.808491E+00 | loss scale: 262144.0 | grad norm: 24290.740 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    93400/  152972 | consumed samples:     42741184 | elapsed time per iteration (ms): 5922.2 | learning rate: 8.103E-05 | global batch size:   512 | lm loss: 2.811693E+00 | loss scale: 262144.0 | grad norm: 25583.985 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    93600/  152972 | consumed samples:     42843584 | elapsed time per iteration (ms): 5950.4 | learning rate: 8.063E-05 | global batch size:   512 | lm loss: 2.808292E+00 | loss scale: 262144.0 | grad norm: 27502.006 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    93800/  152972 | consumed samples:     42945984 | elapsed time per iteration (ms): 5945.1 | learning rate: 8.022E-05 | global batch size:   512 | lm loss: 2.808726E+00 | loss scale: 262144.0 | grad norm: 27107.271 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-10-04 00:51:37,380] [INFO] [logging.py:68:log_dist] [Rank 0] step=94000, skipped=207, lr=[7.98186465205186e-05, 7.98186465205186e-05], mom=[(0.9, 0.999), (0.9, 0.999)]
 iteration    94000/  152972 | consumed samples:     43048384 | elapsed time per iteration (ms): 5929.8 | learning rate: 7.982E-05 | global batch size:   512 | lm loss: 2.809057E+00 | loss scale: 524288.0 | grad norm: 49270.329 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
steps: 94000 loss: 2.7582 iter time (s): 0.003 samples/sec: 173169.704
-------------------------------------------------------------------------------------------------
 validation loss at iteration 94000 | lm loss value: 2.754041E+00 | lm loss PPL: 1.570597E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    94200/  152972 | consumed samples:     43150784 | elapsed time per iteration (ms): 7275.2 | learning rate: 7.942E-05 | global batch size:   512 | lm loss: 2.810958E+00 | loss scale: 262144.0 | grad norm: 25608.341 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    94400/  152972 | consumed samples:     43253184 | elapsed time per iteration (ms): 5939.2 | learning rate: 7.902E-05 | global batch size:   512 | lm loss: 2.809255E+00 | loss scale: 131072.0 | grad norm: 12782.478 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration   94500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-04 01:45:36,743] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step94500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   94500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1512.70
 iteration    94600/  152972 | consumed samples:     43355584 | elapsed time per iteration (ms): 5959.9 | learning rate: 7.862E-05 | global batch size:   512 | lm loss: 2.809127E+00 | loss scale: 131072.0 | grad norm: 12205.472 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    94800/  152972 | consumed samples:     43457984 | elapsed time per iteration (ms): 5955.8 | learning rate: 7.822E-05 | global batch size:   512 | lm loss: 2.811417E+00 | loss scale: 262144.0 | grad norm: 27648.115 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    95000/  152972 | consumed samples:     43560384 | elapsed time per iteration (ms): 5940.9 | learning rate: 7.781E-05 | global batch size:   512 | lm loss: 2.809105E+00 | loss scale: 262144.0 | grad norm: 101342.606 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 95000 | lm loss value: 2.758293E+00 | lm loss PPL: 1.577290E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    95200/  152972 | consumed samples:     43662784 | elapsed time per iteration (ms): 7325.3 | learning rate: 7.741E-05 | global batch size:   512 | lm loss: 2.806445E+00 | loss scale: 262144.0 | grad norm: 26459.632 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    95400/  152972 | consumed samples:     43765184 | elapsed time per iteration (ms): 5952.7 | learning rate: 7.701E-05 | global batch size:   512 | lm loss: 2.809082E+00 | loss scale: 524288.0 | grad norm: 51226.854 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    95600/  152972 | consumed samples:     43867584 | elapsed time per iteration (ms): 5958.8 | learning rate: 7.661E-05 | global batch size:   512 | lm loss: 2.805829E+00 | loss scale: 524288.0 | grad norm: 47963.901 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    95800/  152972 | consumed samples:     43969984 | elapsed time per iteration (ms): 5944.0 | learning rate: 7.622E-05 | global batch size:   512 | lm loss: 2.808121E+00 | loss scale: 1048576.0 | grad norm: 115697.974 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-10-04 04:19:04,979] [INFO] [logging.py:68:log_dist] [Rank 0] step=96000, skipped=211, lr=[7.581883961368615e-05, 7.581883961368615e-05], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 96000 loss: 2.7976 iter time (s): 0.003 samples/sec: 172446.761
 iteration    96000/  152972 | consumed samples:     44072384 | elapsed time per iteration (ms): 5986.3 | learning rate: 7.582E-05 | global batch size:   512 | lm loss: 2.806327E+00 | loss scale: 1048576.0 | grad norm: 99384.164 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 96000 | lm loss value: 2.752790E+00 | lm loss PPL: 1.568634E+01 | 
-------------------------------------------------------------------------------------------------
saving checkpoint at iteration   96000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-04 04:23:31,693] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step96000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   96000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1460.42
 iteration    96200/  152972 | consumed samples:     44174784 | elapsed time per iteration (ms): 7289.2 | learning rate: 7.542E-05 | global batch size:   512 | lm loss: 2.807231E+00 | loss scale: 524288.0 | grad norm: 54964.409 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    96400/  152972 | consumed samples:     44277184 | elapsed time per iteration (ms): 5946.2 | learning rate: 7.503E-05 | global batch size:   512 | lm loss: 2.807410E+00 | loss scale: 262144.0 | grad norm: 24982.077 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    96600/  152972 | consumed samples:     44379584 | elapsed time per iteration (ms): 5955.0 | learning rate: 7.463E-05 | global batch size:   512 | lm loss: 2.804028E+00 | loss scale: 262144.0 | grad norm: 25509.376 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    96800/  152972 | consumed samples:     44481984 | elapsed time per iteration (ms): 5955.3 | learning rate: 7.424E-05 | global batch size:   512 | lm loss: 2.805443E+00 | loss scale: 262144.0 | grad norm: 25994.720 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    97000/  152972 | consumed samples:     44584384 | elapsed time per iteration (ms): 5955.6 | learning rate: 7.384E-05 | global batch size:   512 | lm loss: 2.803409E+00 | loss scale: 262144.0 | grad norm: 25803.839 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 97000 | lm loss value: 2.753859E+00 | lm loss PPL: 1.570311E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    97200/  152972 | consumed samples:     44686784 | elapsed time per iteration (ms): 7310.3 | learning rate: 7.345E-05 | global batch size:   512 | lm loss: 2.807514E+00 | loss scale: 131072.0 | grad norm: 12799.525 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    97400/  152972 | consumed samples:     44789184 | elapsed time per iteration (ms): 5990.9 | learning rate: 7.306E-05 | global batch size:   512 | lm loss: 2.804385E+00 | loss scale: 65536.0 | grad norm: 6307.476 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration   97500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-04 06:57:02,691] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step97500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   97500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1439.79
 iteration    97600/  152972 | consumed samples:     44891584 | elapsed time per iteration (ms): 5966.4 | learning rate: 7.266E-05 | global batch size:   512 | lm loss: 2.803197E+00 | loss scale: 65536.0 | grad norm: 6938.146 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    97800/  152972 | consumed samples:     44993984 | elapsed time per iteration (ms): 5972.5 | learning rate: 7.227E-05 | global batch size:   512 | lm loss: 2.802648E+00 | loss scale: 131072.0 | grad norm: 12930.468 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-10-04 07:46:45,222] [INFO] [logging.py:68:log_dist] [Rank 0] step=98000, skipped=217, lr=[7.187929697477929e-05, 7.187929697477929e-05], mom=[(0.9, 0.999), (0.9, 0.999)]
 iteration    98000/  152972 | consumed samples:     45096384 | elapsed time per iteration (ms): 5959.7 | learning rate: 7.188E-05 | global batch size:   512 | lm loss: 2.803077E+00 | loss scale: 131072.0 | grad norm: 12981.545 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
steps: 98000 loss: 2.8400 iter time (s): 0.003 samples/sec: 173625.320
-------------------------------------------------------------------------------------------------
 validation loss at iteration 98000 | lm loss value: 2.744631E+00 | lm loss PPL: 1.555887E+01 | 
-------------------------------------------------------------------------------------------------
 iteration    98200/  152972 | consumed samples:     45198784 | elapsed time per iteration (ms): 7696.6 | learning rate: 7.149E-05 | global batch size:   512 | lm loss: 2.799447E+00 | loss scale: 131072.0 | grad norm: 12783.497 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    98400/  152972 | consumed samples:     45301184 | elapsed time per iteration (ms): 5960.1 | learning rate: 7.110E-05 | global batch size:   512 | lm loss: 2.800086E+00 | loss scale: 262144.0 | grad norm: 27676.668 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    98600/  152972 | consumed samples:     45403584 | elapsed time per iteration (ms): 5939.2 | learning rate: 7.071E-05 | global batch size:   512 | lm loss: 2.802239E+00 | loss scale: 262144.0 | grad norm: 26204.568 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    98800/  152972 | consumed samples:     45505984 | elapsed time per iteration (ms): 6202.3 | learning rate: 7.032E-05 | global batch size:   512 | lm loss: 2.799443E+00 | loss scale: 524288.0 | grad norm: 50434.969 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    99000/  152972 | consumed samples:     45608384 | elapsed time per iteration (ms): 5940.7 | learning rate: 6.993E-05 | global batch size:   512 | lm loss: 2.802682E+00 | loss scale: 524288.0 | grad norm: 50073.133 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
-------------------------------------------------------------------------------------------------
 validation loss at iteration 99000 | lm loss value: 2.748792E+00 | lm loss PPL: 1.562374E+01 | 
-------------------------------------------------------------------------------------------------
saving checkpoint at iteration   99000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-04 09:35:28,488] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step99000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   99000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1504.53
 iteration    99200/  152972 | consumed samples:     45710784 | elapsed time per iteration (ms): 6868.2 | learning rate: 6.954E-05 | global batch size:   512 | lm loss: 2.800038E+00 | loss scale: 262144.0 | grad norm: 25434.345 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration    99400/  152972 | consumed samples:     45813184 | elapsed time per iteration (ms): 5957.7 | learning rate: 6.915E-05 | global batch size:   512 | lm loss: 2.799743E+00 | loss scale: 262144.0 | grad norm: 29201.498 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration   99586 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-04 10:33:45,007] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step99586/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration   99586 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1585.87
[exiting program after 1190.0101410309474 minutes] datetime: 2021-10-04 10:33:46 
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
2021-10-04 10:34:14.452668: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.453537: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.454136: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.454156: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.460044: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.460326: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.460535: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.460544: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.465089: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.465181: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.465302: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.465371: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.466154: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.466156: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.467205: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.467812: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.468029: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.468371: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.468399: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.468679: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.468692: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.468690: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.468784: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.468823: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.469062: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.469118: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.469130: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.469203: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.469231: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.469548: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.469552: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.470037: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.470804: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.470889: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.470930: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.471329: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.471378: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.471398: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.471394: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.471626: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.471616: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.471629: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.471677: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.477700: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.477698: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.477852: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.477935: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.478448: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.478592: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.478792: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.478967: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.479468: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.479471: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.479506: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.479830: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.479879: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.479991: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.479990: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.482364: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.482362: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.490312: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:14.490419: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:15.208724: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-04 10:34:15.257138: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninja  ....................................  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

op nameop name  ................................  installedinstalled  ....  compatiblecompatible

----------------------------------------------------------------------------------------------------

cpu_adamcpu_adam  ..............................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

fused_adamfused_adam  ..........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

fused_lambfused_lamb  ..........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

sparse_attnsparse_attn  ........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

transformer transformer............  ............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
stochastic_transformerstochastic_transformer  ..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninja  ....................................  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

op nameop name  ................................  installedinstalled  .. ..compatible 
compatible
----------------------------------------------------------------------------------------------------

--------------------------------------------------cpu_adam
cpu_adam DeepSpeed C++/CUDA extension op report ...............
............... -------------------------------------------------- [92m[YES][0m
[92m[YES][0m NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op. ......
...... --------------------------------------------------[92m[OKAY][0m 

[92m[OKAY][0mJIT compiled ops requires ninja

fused_adam .............fused_adam  [93m[NO][0m.............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
fused_lamb .............fused_lamb  [93m[NO][0m.............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
sparse_attnsparse_attn  ........................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

transformertransformer  ............ ............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
stochastic_transformer stochastic_transformer.  .[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


----------------------------------------------------------------------------------------------------
--------------------------------------------------

JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja


--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op report--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------


op nameop nameop name op name  ................ ................ ................ ................installed installed   installed.... installed   compatiblecompatible....
 
 --------------------------------------------------compatiblecompatible--------------------------------------------------


----------------------------------------------------------------------------------------------------

cpu_adam cpu_adam............... cpu_adam............... cpu_adam  [92m[YES][0m ...............[92m[YES][0m ............... ......  ...... [92m[YES][0m [92m[YES][0m[92m[OKAY][0m  [92m[OKAY][0m
............
  [92m[OKAY][0m[92m[OKAY][0m

fused_adam ............. [93m[NO][0m fused_adam....... fused_adamfused_adam .............  [92m[OKAY][0m .............
[93m[NO][0m .............  [93m[NO][0mfused_lamb.......[93m[NO][0m   ............. .......[92m[OKAY][0m .......
 [93m[NO][0m [92m[OKAY][0m [92m[OKAY][0m
.......
fused_lamb  [92m[OKAY][0m.............fused_lamb
fused_lamb  [93m[NO][0m.............   ....................[93m[NO][0m   [92m[OKAY][0m[93m[NO][0m.......
  sparse_attn[92m[OKAY][0m....... 
 ............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
transformer ............sparse_attn  [93m[NO][0m............sparse_attn   sparse_attn...................[93m[NO][0m    ............[93m[NO][0m....... [92m[OKAY][0m  .......
[93m[NO][0m [92m[OKAY][0m [92m[OKAY][0m
stochastic_transformer.......
  .[92m[OKAY][0mtransformer transformer 
[93m[NO][0m............   ...................[93m[NO][0mtransformer    [93m[NO][0m[92m[OKAY][0m...................
   [92m[OKAY][0m.......
[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0mstochastic_transformer
 .stochastic_transformer  stochastic_transformer[93m[NO][0m.   ........[93m[NO][0m   [92m[OKAY][0m[93m[NO][0m.......
  .......[92m[OKAY][0m 
[92m[OKAY][0m
ninjaninja  ....................................  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

op nameop name  ................................  installedinstalled  ....  compatiblecompatible

----------------------------------------------------------------------------------------------------

cpu_adamcpu_adam  ..............................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

fused_adam fused_adam.............  .............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
fused_lamb ............. fused_lamb[93m[NO][0m  ....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m sparse_attn.......  [92m[OKAY][0m............
 [93m[NO][0m .......transformer  [92m[OKAY][0m............
 [93m[NO][0m transformer.......  ............[92m[OKAY][0m 
[93m[NO][0m ....... stochastic_transformer[92m[OKAY][0m 
. [93m[NO][0m stochastic_transformer.......  .[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


op nameop nameop nameop name   ................................ ................  ................ installedinstalled installed installed ..   ....compatible 
.. --------------------------------------------------compatible compatible


compatible--------------------------------------------------
--------------------------------------------------
--------------------------------------------------

cpu_adam ............... [92m[YES][0mcpu_adam cpu_adam cpu_adam.....................  ...............  [92m[YES][0m[92m[OKAY][0m ............... 
 [92m[YES][0m......[92m[YES][0m   ......[92m[OKAY][0m...... 
 [92m[OKAY][0m[92m[OKAY][0m
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
fused_adam
 ............. [93m[NO][0m ....... fused_adam[92m[OKAY][0m 

--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
.............fused_adamfused_adam   fused_lamb.............[93m[NO][0m .............  ............. [93m[NO][0m....... [93m[NO][0m   [93m[NO][0m[92m[OKAY][0m....... .......
 ....... [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0mfused_lamb

 fused_lamb.............fused_lamb  ............. [93m[NO][0m ............. [93m[NO][0m ....... [93m[NO][0m .......[92m[OKAY][0m sparse_attn
.......   [92m[OKAY][0m............[92m[OKAY][0m 

[93m[NO][0m ....... [92m[OKAY][0m
sparse_attn transformer............  ............[93m[NO][0msparse_attnsparse_attn    [93m[NO][0m...................  ................... [92m[OKAY][0m  [93m[NO][0m
[92m[OKAY][0m[93m[NO][0m 
 .............. transformer stochastic_transformer[92m[OKAY][0m  [92m[OKAY][0m............
.
  [93m[NO][0m[93m[NO][0m  .......transformer.......transformer   [92m[OKAY][0m [92m[OKAY][0m............

............  [93m[NO][0m[93m[NO][0m stochastic_transformer ....... ....... . [92m[OKAY][0m [92m[OKAY][0m
[93m[NO][0m
 .......stochastic_transformer [92m[OKAY][0m stochastic_transformer
.  .[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed ninja..  compatible..................
 --------------------------------------------------[92m[OKAY][0m

--------------------------------------------------
op name ................ installedcpu_adam  .................  compatible[92m[YES][0m
 --------------------------------------------------...... 
[92m[OKAY][0m
cpu_adam ............... [92m[YES][0mfused_adam  ...................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0mfused_adam  ....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0msparse_attn
 ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m sparse_attn.......  ............[92m[OKAY][0m 
[93m[NO][0m ....... stochastic_transformer[92m[OKAY][0m 
. [93m[NO][0m .......transformer  [92m[OKAY][0m............
 [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninja   ......................................................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------


op name op name................op name   installed................  ................installed..   compatibleinstalled..
  --------------------------------------------------compatible..

 --------------------------------------------------compatible

--------------------------------------------------
cpu_adam ............... cpu_adam[92m[YES][0m  .....................cpu_adam  [92m[YES][0m[92m[OKAY][0m
  .....................  [92m[OKAY][0m[92m[YES][0m
 ...... fused_adam[92m[OKAY][0m 
............. [93m[NO][0m ....... [92m[OKAY][0mfused_adam
 ............. fused_lamb[93m[NO][0m fused_adam ....................   [92m[OKAY][0m.............[93m[NO][0m
  .......[93m[NO][0mfused_lamb   [92m[OKAY][0m....................
  [92m[OKAY][0m[93m[NO][0m 
....... [92m[OKAY][0mfused_lamb
 ............. [93m[NO][0msparse_attn  ...................  [92m[OKAY][0m[93m[NO][0msparse_attn
  ...................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0mtransformer
 ............ sparse_attn[93m[NO][0mtransformer   ...............................   [93m[NO][0m[92m[OKAY][0m[93m[NO][0m
  .............. stochastic_transformer [92m[OKAY][0m [92m[OKAY][0m
.
 [93m[NO][0mstochastic_transformer  ........  [92m[OKAY][0mtransformer[93m[NO][0m
  ................... [92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
ninjaninjaninjaninja   .................. .................. .................................... [92m[OKAY][0m 
 [92m[OKAY][0m[92m[OKAY][0m--------------------------------------------------


[92m[OKAY][0m--------------------------------------------------op name-------------------------------------------------- 


................op name  op name--------------------------------------------------installed ................ 
................ .. op nameinstalledinstalled   compatible ..................
 .. --------------------------------------------------compatibleinstalled
  
--------------------------------------------------..compatible
 
compatible--------------------------------------------------

--------------------------------------------------
cpu_adam ...............cpu_adam  [92m[YES][0m...............cpu_adam   cpu_adam......[92m[YES][0m...............   ............... [92m[OKAY][0m ......
[92m[YES][0m [92m[YES][0m [92m[OKAY][0m ......
......  [92m[OKAY][0m[92m[OKAY][0m

fused_adam ............. [93m[NO][0m .......fused_adam  fused_adam[92m[OKAY][0m............. fused_adam
 .............  [93m[NO][0mfused_lamb............. [93m[NO][0m   ....................  .......[93m[NO][0m[93m[NO][0m[92m[OKAY][0m  [92m[OKAY][0m
 ..............
  fused_lamb[92m[OKAY][0m[92m[OKAY][0m 
fused_lamb
.............  .............[93m[NO][0m  fused_lamb[93m[NO][0m.......   ....................[92m[OKAY][0msparse_attn  
 [92m[OKAY][0m[93m[NO][0m............
  .......[93m[NO][0m  [92m[OKAY][0m.......
 sparse_attn[92m[OKAY][0m 
............sparse_attn [93m[NO][0m transformer ............ .......  ............sparse_attn [93m[NO][0m[92m[OKAY][0m  [93m[NO][0m
...................  ....... transformer [92m[OKAY][0m [93m[NO][0m[92m[OKAY][0m............

  [93m[NO][0mtransformer.......stochastic_transformer   [92m[OKAY][0m....... 
............ . [92m[OKAY][0m [93m[NO][0m
[93m[NO][0mtransformer   ..............stochastic_transformer  ............[92m[OKAY][0m 
[92m[OKAY][0m .
[93m[NO][0m  stochastic_transformer[93m[NO][0m.......   .......[92m[OKAY][0m. 
 [92m[OKAY][0m[93m[NO][0m
 .......stochastic_transformer  [92m[OKAY][0m.
 [93m[NO][0m ....... [92m[OKAY][0m
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninja--------------------------------------------------


DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    .................................... .................................... [92m[OKAY][0m 
[92m[OKAY][0m 
--------------------------------------------------[92m[OKAY][0m[92m[OKAY][0m
--------------------------------------------------
op name

 ----------------------------------------------------------------------------------------------------op name

................op nameop name  ................  installed ................................installed    ....installedinstalled   ..compatible.. 
  compatiblecompatible--------------------------------------------------
compatible
--------------------------------------------------

--------------------------------------------------
--------------------------------------------------

cpu_adam ............... [92m[YES][0m cpu_adamcpu_adamcpu_adam ...... ..............................   [92m[YES][0m [92m[OKAY][0m ...............[92m[YES][0m 
......[92m[YES][0m   ......[92m[OKAY][0m...... 
 [92m[OKAY][0m[92m[OKAY][0mfused_adam

 ............. [93m[NO][0m .......fused_adam  [92m[OKAY][0m.............fused_adam
  [93m[NO][0m ....................fused_lamb  [93m[NO][0m fused_adam [92m[OKAY][0m....................
   [93m[NO][0m[92m[OKAY][0mfused_lamb............. .......
   [93m[NO][0m............. [92m[OKAY][0mfused_lamb 
.......[93m[NO][0m  ....................   [93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m 

....... [92m[OKAY][0msparse_attn
 ............ [93m[NO][0mfused_lamb  ....................  [92m[OKAY][0m[93m[NO][0msparse_attn
  ...................sparse_attn transformer [93m[NO][0m   ............[92m[OKAY][0m................... 
 [93m[NO][0m  [92m[OKAY][0m[93m[NO][0m
 ..............  [92m[OKAY][0m[92m[OKAY][0m
transformer
 ............ transformer[93m[NO][0mstochastic_transformer   ....................  sparse_attn[93m[NO][0m [92m[OKAY][0m  [93m[NO][0m
.......  ................... [92m[OKAY][0mstochastic_transformer [92m[OKAY][0m [93m[NO][0m

.  .......[93m[NO][0m stochastic_transformer  ....... .[92m[OKAY][0m[92m[OKAY][0m 

[93m[NO][0m ....... [92m[OKAY][0mtransformer
 ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

JIT compiled ops requires ninja
--------------------------------------------------

DeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja
--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report--------------------------------------------------


----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report


DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------

------------------------------------------------------------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninja
JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninjaJIT compiled ops requires ninja--------------------------------------------------


JIT compiled ops requires ninja
ninjaninjaninjaninja   .................. .................................... ..................   [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m[92m[OKAY][0m
--------------------------------------------------

--------------------------------------------------

----------------------------------------------------------------------------------------------------
op nameop nameop name
   op name................................ ................   installedinstalled ................installed  .. compatible.. installed
  ..--------------------------------------------------
compatible.. 
 compatible--------------------------------------------------compatible
cpu_adam

-------------------------------------------------- 
--------------------------------------------------...............
 [92m[YES][0m ...... [92m[OKAY][0mcpu_adam
cpu_adam cpu_adam ............... ............... ............... [92m[YES][0m [92m[YES][0m[92m[YES][0mfused_adam    ...............................   [92m[OKAY][0m [92m[OKAY][0m[93m[NO][0m
[92m[OKAY][0m
 
....... [92m[OKAY][0m
fused_adamfused_lamb  ..........................fused_adam  [93m[NO][0m  fused_adam[93m[NO][0m....... .............  [92m[OKAY][0m .......
.............[93m[NO][0m   [93m[NO][0m[92m[OKAY][0m....... 
 .......[92m[OKAY][0m
fused_lamb .............  sparse_attnfused_lamb[92m[OKAY][0m [93m[NO][0m
............   .............[93m[NO][0mfused_lamb .......  [93m[NO][0m............. ....... [92m[OKAY][0m 
 .......[92m[OKAY][0m[93m[NO][0m  
.......[92m[OKAY][0m
transformer ............  [92m[OKAY][0m[93m[NO][0m
sparse_attn .......  ............[92m[OKAY][0m
sparse_attn  [93m[NO][0m ............sparse_attnstochastic_transformer.......    .............[93m[NO][0m [92m[OKAY][0m  
[93m[NO][0m[93m[NO][0m.......   .......transformer [92m[OKAY][0m [92m[OKAY][0m.......
............
 [92m[OKAY][0m 
transformer[93m[NO][0m  ................... transformer [93m[NO][0m[92m[OKAY][0m 
.......  [92m[OKAY][0m............ stochastic_transformer
[93m[NO][0m  .stochastic_transformer  [93m[NO][0m. .......  .......[93m[NO][0m[92m[OKAY][0m  
[92m[OKAY][0m.......
 [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

------------------------------------------------------------------------------------------------------------------------------------------------------

JIT compiled ops requires ninja
JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


DeepSpeed C++/CUDA extension op report--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------

JIT compiled ops requires ninja--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja


DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    ......................................................  .................. [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
 

[92m[OKAY][0m----------------------------------------------------------------------------------------------------

--------------------------------------------------

op name--------------------------------------------------op name 
op name................   op name................installed ................  ................ ..installed installed installed  .. compatible..  ..
compatiblecompatible 
--------------------------------------------------
compatible--------------------------------------------------
--------------------------------------------------


--------------------------------------------------
cpu_adam ...............cpu_adamcpu_adam  cpu_adam [92m[YES][0m...............  ............... ..................... [92m[YES][0m  [92m[YES][0m[92m[OKAY][0m [92m[YES][0m 
...... ...... ...... [92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0m
fused_adamfused_adam fused_adam fused_adam............. .............  ..........................[93m[NO][0m   [93m[NO][0m[93m[NO][0m.......   .............. [92m[OKAY][0m [92m[OKAY][0m 
[92m[OKAY][0m
fused_lamb [93m[NO][0m
.............fused_lamb fused_lamb.............  .......  [93m[NO][0m [93m[NO][0m............. [92m[OKAY][0m  .......[93m[NO][0m
.......   .......[92m[OKAY][0m [92m[OKAY][0mfused_lamb
[92m[OKAY][0m
 
............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attnsparse_attn  ........................sparse_attn   [93m[NO][0m[93m[NO][0m............   .......[93m[NO][0m.......   .......[92m[OKAY][0m[92m[OKAY][0m 
[92m[OKAY][0m

sparse_attntransformertransformer transformer  ........................ [93m[NO][0m  ................... ............ [93m[NO][0m  [92m[OKAY][0m [93m[NO][0m.......[93m[NO][0m
   .......[92m[OKAY][0m....... 
 [92m[OKAY][0m[92m[OKAY][0m

transformerstochastic_transformer  .............stochastic_transformer stochastic_transformer  [93m[NO][0m[93m[NO][0m .  ........ .......   [93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m
[93m[NO][0m 
 ..............stochastic_transformer  [92m[OKAY][0m[92m[OKAY][0m

 . [93m[NO][0m ....... [92m[OKAY][0m
ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m

[92m[OKAY][0m
----------------------------------------------------------------------------------------------------
--------------------------------------------------


--------------------------------------------------
op nameop nameop name  op name................  ................ ................................installed    installed..installedinstalled   ....   compatiblecompatiblecompatible..


 ------------------------------------------------------------------------------------------------------------------------------------------------------compatible


--------------------------------------------------
cpu_adam ...............cpu_adam cpu_adam[92m[YES][0m  cpu_adam ....................................   [92m[YES][0m ............... [92m[OKAY][0m [92m[YES][0m......
[92m[YES][0m   ......[92m[OKAY][0m...... 
 [92m[OKAY][0m[92m[OKAY][0m

fused_adam ............. [93m[NO][0m ....... fused_adam[92m[OKAY][0m 
.............fused_adamfused_adam fused_lamb  [93m[NO][0m............. ............. ............. .......  [93m[NO][0m [93m[NO][0m[93m[NO][0m [92m[OKAY][0m .............. 
  .......[92m[OKAY][0m[92m[OKAY][0m 
fused_lamb
[92m[OKAY][0m 
.............fused_lamb  fused_lamb[93m[NO][0m.............   .................... sparse_attn[93m[NO][0m [92m[OKAY][0m  ............[93m[NO][0m
.......  [93m[NO][0m [92m[OKAY][0m .......
.......  [92m[OKAY][0m[92m[OKAY][0m

sparse_attn transformer............  ............sparse_attn[93m[NO][0m   [93m[NO][0m....... ............ .......sparse_attn  [92m[OKAY][0m............ [93m[NO][0m
  [92m[OKAY][0m.......[93m[NO][0m
 transformer [92m[OKAY][0m .......
stochastic_transformer............   transformer. [92m[OKAY][0m[93m[NO][0m [93m[NO][0m
  ................... transformer....... [93m[NO][0m  [92m[OKAY][0m ............[92m[OKAY][0m
 
.......[93m[NO][0mstochastic_transformer   [92m[OKAY][0m........
  [93m[NO][0m[92m[OKAY][0m 
.......stochastic_transformer  [92m[OKAY][0mstochastic_transformer
.  .[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
ninjaninjaninjaninja    ...................................................... ..................  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------

op name
op nameop name  op name................ ................ ................   installed................ installedinstalled..  ..installed  .. compatible
--------------------------------------------------
  compatiblecompatible..

--------------------------------------------------cpu_adam
--------------------------------------------------  
...............compatible 
[92m[YES][0m-------------------------------------------------- 
cpu_adam......  cpu_adam...............[92m[OKAY][0m  
...............[92m[YES][0m  [92m[YES][0m ......cpu_adam ......  [92m[OKAY][0m[92m[OKAY][0m
fused_adam ............. [93m[NO][0m...............
  .......[92m[YES][0m fused_adam [92m[OKAY][0m 
fused_adam.............  .............fused_lamb......  [93m[NO][0m .............[93m[NO][0m[92m[OKAY][0m 
  .......[93m[NO][0m.......   [92m[OKAY][0m.......[92m[OKAY][0m
 
[92m[OKAY][0m
fused_lambfused_lamb  ..........................fused_adam   [93m[NO][0m.............[93m[NO][0m sparse_attn .......  ....... [93m[NO][0m............[92m[OKAY][0m   [92m[OKAY][0m.......
[93m[NO][0m
 .......  [92m[OKAY][0m[92m[OKAY][0m

fused_lambtransformer  sparse_attnsparse_attn............   .....................................[93m[NO][0m   [93m[NO][0m [93m[NO][0m .......[93m[NO][0m.......    [92m[OKAY][0m.......
 .......[92m[OKAY][0m[92m[OKAY][0m 
transformer[92m[OKAY][0m

transformer ............ stochastic_transformer ............ [93m[NO][0m . [93m[NO][0m....... [93m[NO][0m   .......[92m[OKAY][0m....... 
 [92m[OKAY][0msparse_attn[92m[OKAY][0m
stochastic_transformer 
 .stochastic_transformer  [93m[NO][0m............ . ....... [93m[NO][0m [92m[OKAY][0m [93m[NO][0m
.......  [92m[OKAY][0m.......
 [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
ninjaninjaninjaninja   .................................... ..................  ..................[92m[OKAY][0m[92m[OKAY][0m  

[92m[OKAY][0m[92m[OKAY][0m----------------------------------------------------------------------------------------------------


--------------------------------------------------op name--------------------------------------------------op name
 
 op name................................op name   installed ................installed ................  .. installed..installed    ..compatible..compatible 

compatible ----------------------------------------------------------------------------------------------------
compatible

--------------------------------------------------

stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
cpu_adamcpu_adam  ..............................  cpu_adam[92m[YES][0m[92m[YES][0m  cpu_adam......  ............... ......[92m[OKAY][0m............... 
  [92m[YES][0m[92m[OKAY][0m[92m[YES][0m 
 ............  [92m[OKAY][0m[92m[OKAY][0m
fused_adam
 ............. [93m[NO][0mfused_adam  ....................  [92m[OKAY][0m[93m[NO][0m
 fused_adamfused_adam.......   fused_lamb.............[92m[OKAY][0m............. 
 ............. [93m[NO][0m fused_lamb[93m[NO][0m [93m[NO][0m  ....... ...........................  [93m[NO][0m [92m[OKAY][0m  [92m[OKAY][0m[92m[OKAY][0m

.......
 [92m[OKAY][0mfused_lamb
 fused_lamb.............  .............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0msparse_attn
 sparse_attn............  ............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0msparse_attn
transformer  sparse_attn........................transformer    ............[93m[NO][0m[93m[NO][0m............    .......[93m[NO][0m.......  [93m[NO][0m [92m[OKAY][0m[92m[OKAY][0m....... 

 .......[92m[OKAY][0m 
stochastic_transformertransformer[92m[OKAY][0m  
.stochastic_transformer............   [93m[NO][0m[93m[NO][0m.transformer    [93m[NO][0m.............. ............ .......  [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m[93m[NO][0m

 ....... [92m[OKAY][0mstochastic_transformer
 . stochastic_transformer[93m[NO][0m  ........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report

--------------------------------------------------DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------


--------------------------------------------------DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninja----------------------------------------------------------------------------------------------------


JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
JIT compiled ops requires ninja
------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


------------------------------------------------------------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------
--------------------------------------------------
--------------------------------------------------

DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja
JIT compiled ops requires ninja


JIT compiled ops requires ninja
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


op nameop nameop nameop name    ................................................................   installed installedinstalled installed  .... ..  ..compatible compatible compatible

compatible
----------------------------------------------------------------------------------------------------
--------------------------------------------------

--------------------------------------------------

cpu_adam cpu_adam...............cpu_adamcpu_adam    ...............[92m[YES][0m............... ...............  [92m[YES][0m ...... [92m[YES][0m [92m[YES][0m [92m[OKAY][0m...... 
...... ...... [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

fused_adam ............. [93m[NO][0m .......fused_adam fused_adamfused_adam[92m[OKAY][0m  
 ....................................... fused_lamb  [93m[NO][0m[93m[NO][0m   [93m[NO][0m.................... .......  .......[92m[OKAY][0m[93m[NO][0m  
 [92m[OKAY][0m[92m[OKAY][0m.......

 fused_lamb[92m[OKAY][0m 
.............fused_lambfused_lamb  [93m[NO][0m ............. ............. ....... [93m[NO][0m [93m[NO][0m[92m[OKAY][0m  
..............sparse_attn   [92m[OKAY][0m[92m[OKAY][0m............

 [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn transformer............  sparse_attn[93m[NO][0m............  sparse_attn ................... [93m[NO][0m   ............[92m[OKAY][0m[93m[NO][0m....... 
  [93m[NO][0m[92m[OKAY][0m.......  
transformer.......[92m[OKAY][0m  
............[92m[OKAY][0mstochastic_transformer 
 transformer[93m[NO][0m.  transformer ............ [93m[NO][0m ................... [93m[NO][0m   .......[92m[OKAY][0m[93m[NO][0m.......
   .......[92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0mstochastic_transformer
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


 stochastic_transformer.  stochastic_transformer[93m[NO][0m.  .......  .[92m[OKAY][0m[93m[NO][0m 
[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


op nameop name op nameop name ................  ................................  ................installedinstalled    installedinstalled....   .. ..compatible compatible 

compatiblecompatible--------------------------------------------------
--------------------------------------------------


----------------------------------------------------------------------------------------------------

cpu_adamcpu_adam  ..............................cpu_adamcpu_adam    [92m[YES][0m..............................[92m[YES][0m    ......[92m[YES][0m [92m[YES][0m......  ...... [92m[OKAY][0m [92m[OKAY][0m......
[92m[OKAY][0m
 
[92m[OKAY][0m
fused_adam ............. fused_adamfused_adam[93m[NO][0m   fused_adam..........................  ....... [93m[NO][0m............. [93m[NO][0m  [92m[OKAY][0m .......[93m[NO][0m
 ....... [92m[OKAY][0m .......
[92m[OKAY][0mfused_lamb 
 fused_lamb.............[92m[OKAY][0mfused_lamb  .............
 [93m[NO][0m ............. fused_lamb[93m[NO][0m .......  [93m[NO][0m.......  ............. [92m[OKAY][0m .......[92m[OKAY][0m
[93m[NO][0m 
 [92m[OKAY][0m.......
 [92m[OKAY][0m
sparse_attn ............sparse_attn  [93m[NO][0msparse_attn............ sparse_attn .......   [93m[NO][0m............[92m[OKAY][0m............ 
 ....... [93m[NO][0m transformer[92m[OKAY][0m  [93m[NO][0m
...................   .......[93m[NO][0mtransformer [92m[OKAY][0m [92m[OKAY][0m ............
.......
  [92m[OKAY][0m[93m[NO][0mtransformer 
transformer.......  stochastic_transformer[92m[OKAY][0m ............
 ............ . [93m[NO][0m [93m[NO][0m [93m[NO][0mstochastic_transformer  ....... .............. .  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m

 
....... [92m[OKAY][0mstochastic_transformerstochastic_transformer
  ..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

------------------------------------------------------------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


DeepSpeed C++/CUDA extension op report--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninja--------------------------------------------------
--------------------------------------------------

JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja


DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------


--------------------------------------------------op nameop nameop name  
 ................................................ op name  installed installedinstalled................   .. ....installed   compatible 
compatiblecompatible..--------------------------------------------------

 
----------------------------------------------------------------------------------------------------compatible


--------------------------------------------------
cpu_adamcpu_adam  ..............................cpu_adam cpu_adam [92m[YES][0m   ..............................[92m[YES][0m......    [92m[YES][0m[92m[YES][0m......[92m[OKAY][0m   ......
[92m[OKAY][0m...... 
[92m[OKAY][0m 
[92m[OKAY][0m
fused_adam .............fused_adam  [93m[NO][0m.............  .......[93m[NO][0mfused_adam   .............fused_adam[92m[OKAY][0m .......[93m[NO][0m 
  [92m[OKAY][0m....................
fused_lamb   [92m[OKAY][0m.............fused_lamb[93m[NO][0m 
[93m[NO][0m   ...........................   fused_lamb[92m[OKAY][0m[93m[NO][0m [92m[OKAY][0m
 .............
.......  fused_lamb[93m[NO][0m[92m[OKAY][0m  .............
....... [92m[OKAY][0m 
sparse_attn[93m[NO][0m  ...................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
sparse_attn ............transformer sparse_attn [93m[NO][0m............   [93m[NO][0m...................  [92m[OKAY][0m 
[93m[NO][0m.......  .......[92m[OKAY][0mtransformer 
sparse_attn [92m[OKAY][0m ............
............stochastic_transformer  [93m[NO][0m . transformer[93m[NO][0m .......  [93m[NO][0m ............ [92m[OKAY][0m..............
   [92m[OKAY][0m[92m[OKAY][0mstochastic_transformer[93m[NO][0m 

.  .......[93m[NO][0mtransformer   [92m[OKAY][0m...................
  [92m[OKAY][0m[93m[NO][0m
 stochastic_transformer ........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m[NO][0m

transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils ..................async_io  [92m[YES][0m...............  ......[93m[NO][0m  [92m[OKAY][0m.......
 [93m[NO][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference-------------------------------------------------- 
.. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... transformer_inference[93m[NO][0m  .........  [93m[NO][0m[93m[NO][0m 
....... [92m[OKAY][0m
utilstransformer_inference  ....................  [92m[YES][0m[93m[NO][0m  .............  [92m[OKAY][0m[92m[OKAY][0m

quantizer .............. [93m[NO][0m utils.......  ..................[92m[OKAY][0m 
[92m[YES][0m ...... [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m .......transformer_inference  [93m[NO][0m..
 [93m[NO][0m ....... [92m[OKAY][0m
transformer_inferenceutils  ....................  [93m[NO][0m[92m[YES][0m  .............  [92m[OKAY][0m[92m[OKAY][0m

quantizer utils..............  ..................[93m[NO][0m  [92m[YES][0m.......  ......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ...............async_io [93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
transformer_inference .. [93m[NO][0m .......transformer_inference  [92m[OKAY][0m..
 [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0mutils
 .................. [92m[YES][0m quantizer......  ..............[92m[OKAY][0m 
[93m[NO][0m .......quantizer  [92m[OKAY][0m..............
 [93m[NO][0m .......-------------------------------------------------- 
[92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... transformer_inference[93m[NO][0m  .........  [93m[NO][0m[93m[NO][0m 
....... [92m[OKAY][0m
utilstransformer_inference  ....................  [92m[YES][0m[93m[NO][0m  .............  [92m[OKAY][0m[92m[OKAY][0m

quantizer utils..............  ..................[93m[NO][0m  [92m[YES][0m.......  ......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ...............async_io [93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
transformer_inference .. [93m[NO][0mtransformer_inference  .........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... utils[92m[OKAY][0m 
.................. [92m[YES][0m ...... quantizer[92m[OKAY][0m 
.............. [93m[NO][0m quantizer.......  ..............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m--------------------------------------------------

--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... transformer_inference[93m[NO][0m  .........  [93m[NO][0m[93m[NO][0m 
....... [92m[OKAY][0m
transformer_inferenceutils  ....................  [93m[NO][0m[92m[YES][0m  .............  [92m[OKAY][0m[92m[OKAY][0m

quantizer ..............utils  [93m[NO][0m..................  .......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ....... .......[93m[NO][0m 
[93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m ....... [93m[NO][0m
 ....... [93m[NO][0m
transformer_inference .. [93m[NO][0mtransformer_inference  ......... [92m[OKAY][0m 
[93m[NO][0m ....... utils .................. [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m utils....... [92m[OKAY][0m
 --------------------------------------------------..................
 [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m ....... async_io[93m[NO][0m
 ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0mutils
 .................. quantizer[92m[YES][0m  ....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
quantizer ..............-------------------------------------------------- 
[93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m transformer_inference.......  ..[93m[NO][0m 
[93m[NO][0m ....... [92m[OKAY][0m
transformer_inference ..utils  [93m[NO][0m..................  .......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer utils..............  ..................[93m[NO][0m  [92m[YES][0m.......  ......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io ...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
----------------------------------------------------------------------------------------------------

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utils utils..................  ..................[92m[YES][0m  [92m[YES][0m...... ......  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... async_io[93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m 
async_io.......  [93m[NO][0m...............
 [93m[NO][0m ....... [93m[NO][0mtransformer_inference
 .. [93m[NO][0m .......transformer_inference  [92m[OKAY][0m..
 [93m[NO][0mtransformer_inference  ....... ..[92m[OKAY][0m 
[93m[NO][0mutils  .........................  [92m[OKAY][0m[92m[YES][0m
 utils......  ..................[92m[OKAY][0m 
[92m[YES][0m ......utils quantizer [92m[OKAY][0m ..................
..............  [92m[YES][0m[93m[NO][0m quantizer ...... ....... .............. [92m[OKAY][0m [92m[OKAY][0m
[93m[NO][0m
 ....... [92m[OKAY][0m
quantizer-------------------------------------------------- 
.............. --------------------------------------------------[93m[NO][0m
 ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... async_io[93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m
 ....... [93m[NO][0m
transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m async_io.......  [93m[NO][0m...............
 [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m transformer_inference.......  ..[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
utils .................. utils[92m[YES][0m  ........................  [92m[YES][0m[92m[OKAY][0m 
...... quantizer[92m[OKAY][0m 
.............. [93m[NO][0m .......quantizer  ..............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m--------------------------------------------------

--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']torch version
 .................... torch version1.8.1 
.................... torch cuda version1.8.1 
............... torch cuda version11.1 
............... nvcc version11.1 
.....................nvcc version  11.2.....................
 11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ...................0.4.2+72ce55a, 72ce55a, big-science 
0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w.
 ......deepspeed wheel compiled w.  torch 1.8, cuda 11.1......
 torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path DeepSpeed general environment info:............... 
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']torch install path
 ............... torch version .................... 1.8.1
torch cuda version['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 
............... 11.1torch version
 nvcc version....................  .....................1.8.1 
11.2
torch cuda versiondeepspeed install path  ..........................  11.1
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']nvcc version
 deepspeed info.....................  ...................11.2 
0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed install path
 ...........deepspeed wheel compiled w.  ......['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
torch 1.8, cuda 11.1
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 ...........deepspeed install path  ...........['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info
 ...................deepspeed info  0.4.2+72ce55a, 72ce55a, big-science...................
 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch version torch version....................  ....................1.8.1 
1.8.1
torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch install path ...............torch version  .................... 1.8.1
torch cuda version['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 
............... 11.1torch version
 nvcc version....................  .....................1.8.1 
11.2
torch cuda versiondeepspeed install path  ..........................  11.1
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']nvcc version
 deepspeed info.....................  11.2...................
 deepspeed install path0.4.2+72ce55a, 72ce55a, big-science 
...........deepspeed wheel compiled w.  ......['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
torch 1.8, cuda 11.1
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  0.4.2+72ce55a, 72ce55a, big-science...................
 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w.
 ......deepspeed wheel compiled w.  torch 1.8, cuda 11.1......
 torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
DeepSpeed general environment info:torch install path 
............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']....................
 1.8.1
torch version torch cuda version....................  ...............1.8.1 
11.1
torch cuda versionnvcc version  ....................................  11.111.2

nvcc versiondeepspeed install path  ................................  11.2
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed install path
 deepspeed info...........  ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']0.4.2+72ce55a, 72ce55a, big-science

deepspeed infodeepspeed wheel compiled w.  .........................  0.4.2+72ce55a, 72ce55a, big-sciencetorch 1.8, cuda 11.1

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info deepspeed info...................  ...................0.4.2+72ce55a, 72ce55a, big-science 
0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path torch install path...............  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version ....................torch version  1.8.1....................
 1.8.1
torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ...................0.4.2+72ce55a, 72ce55a, big-science 
0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path ............... 
torch install path['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 
............... torch version .................... 1.8.1
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']torch cuda version
 ............... torch version11.1 
....................nvcc version  1.8.1.....................
 11.2
torch cuda versiondeepspeed install path  ..........................  11.1
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']nvcc version
 deepspeed info.....................  ...................11.2 
0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed install path
 ...........deepspeed wheel compiled w.  ......['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] torch 1.8, cuda 11.1

deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc version nvcc version.....................  .....................11.2 
11.2
deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch version ....................torch install path 1.8.1 
............... torch cuda version ............... 11.1
nvcc version['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 
..................... 11.2
torch versiondeepspeed install path  ...............................  1.8.1
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
torch cuda versiondeepspeed info  ..................................  11.10.4.2+72ce55a, 72ce55a, big-science

nvcc versiondeepspeed wheel compiled w.  ..................... ......11.2 
torch 1.8, cuda 11.1deepspeed install path
 ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version DeepSpeed general environment info:..................... 11.2

deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']torch install path
 deepspeed info...............  ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']...... 
torch 1.8, cuda 11.1
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed infoDeepSpeed general environment info: ................... 0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w. ...... torch install pathtorch 1.8, cuda 11.1
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
 ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ................... ...................0.4.2+72ce55a, 72ce55a, big-science 
0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w. deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****


/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

using world size: 64, data-parallel-size: 4, tensor-model-parallel size: 4, pipeline-model-parallel size: 4 
using torch.float16 for parameters ...
------------------------ arguments ------------------------
  accumulate_allreduce_grads_in_fp32 .............. False
  adam_beta1 ...................................... 0.9
  adam_beta2 ...................................... 0.999
  adam_eps ........................................ 1e-08
  adlr_autoresume ................................. False
  adlr_autoresume_interval ........................ 1000
  apply_query_key_layer_scaling ................... True
  apply_residual_connection_post_layernorm ........ False
  attention_dropout ............................... 0.1
  attention_softmax_in_fp32 ....................... False
  bert_binary_head ................................ True
  bert_load ....................................... None
  bf16 ............................................ False
  bias_dropout_fusion ............................. True
  bias_gelu_fusion ................................ True
  biencoder_projection_dim ........................ 0
  biencoder_shared_query_context_model ............ False
  block_data_path ................................. None
  checkpoint_activations .......................... True
  checkpoint_in_cpu ............................... False
  checkpoint_num_layers ........................... 1
  clip_grad ....................................... 1.0
  codecarbon_dir .................................. None
  consumed_train_samples .......................... 0
  consumed_valid_samples .......................... 0
  contigious_checkpointing ........................ False
  cpu_optimizer ................................... False
  cpu_torch_adam .................................. False
  data_impl ....................................... mmap
  data_parallel_size .............................. 4
  data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document']
  dataloader_type ................................. single
  DDP_impl ........................................ local
  decoder_seq_length .............................. None
  deepscale ....................................... False
  deepscale_config ................................ None
  deepspeed ....................................... True
  deepspeed_activation_checkpointing .............. True
  deepspeed_config ................................ ./ds_config.1389469.json
  deepspeed_mpi ................................... False
  distribute_checkpointed_activations ............. False
  distributed_backend ............................. nccl
  embedding_path .................................. None
  encoder_seq_length .............................. 2048
  eod_mask_loss ................................... False
  eval_interval ................................... 1000
  eval_iters ...................................... 100
  evidence_data_path .............................. None
  exit_duration_in_mins ........................... 1190
  exit_interval ................................... None
  ffn_hidden_size ................................. 8192
  finetune ........................................ False
  fp16 ............................................ True
  fp16_lm_cross_entropy ........................... False
  fp32_residual_connection ........................ False
  gigaflos_no_embeds .............................. 0
  global_batch_size ............................... 512
  glu_activation .................................. None
  hidden_dropout .................................. 0.1
  hidden_size ..................................... 2048
  hysteresis ...................................... 2
  ict_head_size ................................... None
  ict_load ........................................ None
  img_dim ......................................... 224
  indexer_batch_size .............................. 128
  indexer_log_interval ............................ 1000
  init_method_std ................................. 0.02
  init_method_xavier_uniform ...................... False
  initial_loss_scale .............................. 4294967296
  kv_channels ..................................... 128
  layernorm_epsilon ............................... 1e-05
  lazy_mpu_init ................................... None
  load ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
  local_rank ...................................... 0
  log_batch_size_to_tensorboard ................... True
  log_interval .................................... 200
  log_learning_rate_to_tensorboard ................ True
  log_loss_scale_to_tensorboard ................... True
  log_num_zeros_in_grad ........................... False
  log_params_norm ................................. False
  log_timers_to_tensorboard ....................... True
  log_validation_ppl_to_tensorboard ............... True
  loss_on_targets_only ............................ False
  loss_scale ...................................... 12.0
  loss_scale_window ............................... 1000
  lr .............................................. 0.0002
  lr_decay_iters .................................. None
  lr_decay_samples ................................ 73242187
  lr_decay_style .................................. cosine
  lr_warmup_fraction .............................. None
  lr_warmup_iters ................................. 0
  lr_warmup_samples ............................... 183105
  make_vocab_size_divisible_by .................... 128
  mask_prob ....................................... 0.15
  masked_softmax_fusion ........................... True
  max_position_embeddings ......................... None
  memory_centric_tiled_linear ..................... False
  merge_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-merges.txt
  micro_batch_size ................................ 8
  min_loss_scale .................................. 1.0
  min_lr .......................................... 1e-05
  mmap_warmup ..................................... False
  no_load_optim ................................... None
  no_load_rng ..................................... None
  no_save_optim ................................... None
  no_save_rng ..................................... None
  no_train ........................................ None
  num_attention_heads ............................. 16
  num_channels .................................... 3
  num_classes ..................................... 1000
  num_layers ...................................... 24
  num_layers_per_virtual_pipeline_stage ........... None
  num_workers ..................................... 2
  onnx_safe ....................................... None
  openai_gelu ..................................... False
  optimizer ....................................... adam
  override_lr_scheduler ........................... False
  params_dtype .................................... torch.float16
  partition_activations ........................... False
  patch_dim ....................................... 16
  pipeline_model_parallel_size .................... 4
  position_embedding_type ......................... PositionEmbeddingType.rotary
  profile_backward ................................ False
  query_in_block_prob ............................. 0.1
  rampup_batch_size ............................... ['32', '32', '2_000_000']
  rank ............................................ 0
  remote_device ................................... none
  reset_attention_mask ............................ False
  reset_position_ids .............................. False
  retriever_report_topk_accuracies ................ []
  retriever_score_scaling ......................... False
  retriever_seq_length ............................ 256
  sample_rate ..................................... 1.0
  save ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
  save_interval ................................... 1500
  scatter_gather_tensors_in_pipeline .............. True
  scattered_embeddings ............................ False
  seed ............................................ 1234
  seq_length ...................................... 2048
  sgd_momentum .................................... 0.9
  short_seq_prob .................................. 0.1
  split ........................................... 949,50,1
  split_transformers .............................. False
  synchronize_each_layer .......................... False
  tensor_model_parallel_size ...................... 4
  tensorboard_dir ................................. /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/tr4c-1B3-rotary-oscar-logs/tensorboard
  tensorboard_log_interval ........................ 1
  tensorboard_queue_size .......................... 5
  tile_factor ..................................... 1
  titles_data_path ................................ None
  tokenizer_name_or_path .......................... None
  tokenizer_type .................................. GPT2BPETokenizer
  train_iters ..................................... None
  train_samples ................................... 73242187
  use_checkpoint_lr_scheduler ..................... False
  use_contiguous_buffers_in_ddp ................... False
  use_cpu_initialization .......................... None
  use_one_sent_docs ............................... False
  use_pin_memory .................................. False
  virtual_pipeline_model_parallel_size ............ None
  vocab_extra_ids ................................. 0
  vocab_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-vocab.json
  weight_decay .................................... 0.1
  world_size ...................................... 64
  zero_allgather_bucket_size ...................... 0.0
  zero_contigious_gradients ....................... False
  zero_reduce_bucket_size ......................... 0.0
  zero_reduce_scatter ............................. False
  zero_stage ...................................... 1
-------------------- end of arguments ---------------------
will use batch size rampup starting from global batch size 32 to global batch size 512 with batch size increments 32 over 2000000 samples.
> building GPT2BPETokenizer tokenizer ...
> setting tensorboard ...
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
 > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688)
> initializing torch distributed ...
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [92m[OKAY][0m
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [92m[YES][0m ...... [92m[OKAY][0m
fused_adam ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m ....... [92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
> initializing tensor model parallel with size 4
> initializing pipeline model parallel with size 4
> setting random seeds to 1234 ...
[2021-10-04 10:34:30,443] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
> compiling dataset index builder ...
make: Entering directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data'
make: Nothing to be done for 'default'.
make: Leaving directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data'
>>> done with dataset index builder. Compilation time: 0.322 seconds
> compiling and loading fused kernels ...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_upper_triang_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_upper_triang_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module fused_mix_prec_layer_norm_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_mix_prec_layer_norm_cuda...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
>>> done with compiling and loading fused kernels. Compilation time: 21.245 seconds
time to initialize megatron (seconds): 44.013
[after megatron is initialized] datetime: 2021-10-04 10:34:52 
building GPT model ...
[2021-10-04 10:34:52,097] [INFO] [utils.py:680:see_memory_usage] Before Building Model
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved
  warnings.warn(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved
  warnings.warn(
[2021-10-04 10:34:52,099] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB         Max_MA 0.0 GB         CA 0.0 GB         Max_CA 0 GB 
[2021-10-04 10:34:52,099] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory:  used = 40.64 GB, percent = 21.7%
SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None
Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=1, data=0, model=0): 16, ProcessCoord(pipe=1, data=0, model=1): 17, ProcessCoord(pipe=1, data=0, model=2): 18, ProcessCoord(pipe=1, data=0, model=3): 19, ProcessCoord(pipe=1, data=1, model=0): 20, ProcessCoord(pipe=1, data=1, model=1): 21, ProcessCoord(pipe=1, data=1, model=2): 22, ProcessCoord(pipe=1, data=1, model=3): 23, ProcessCoord(pipe=1, data=2, model=0): 24, ProcessCoord(pipe=1, data=2, model=1): 25, ProcessCoord(pipe=1, data=2, model=2): 26, ProcessCoord(pipe=1, data=2, model=3): 27, ProcessCoord(pipe=1, data=3, model=0): 28, ProcessCoord(pipe=1, data=3, model=1): 29, ProcessCoord(pipe=1, data=3, model=2): 30, ProcessCoord(pipe=1, data=3, model=3): 31, ProcessCoord(pipe=2, data=0, model=0): 32, ProcessCoord(pipe=2, data=0, model=1): 33, ProcessCoord(pipe=2, data=0, model=2): 34, ProcessCoord(pipe=2, data=0, model=3): 35, ProcessCoord(pipe=2, data=1, model=0): 36, ProcessCoord(pipe=2, data=1, model=1): 37, ProcessCoord(pipe=2, data=1, model=2): 38, ProcessCoord(pipe=2, data=1, model=3): 39, ProcessCoord(pipe=2, data=2, model=0): 40, ProcessCoord(pipe=2, data=2, model=1): 41, ProcessCoord(pipe=2, data=2, model=2): 42, ProcessCoord(pipe=2, data=2, model=3): 43, ProcessCoord(pipe=2, data=3, model=0): 44, ProcessCoord(pipe=2, data=3, model=1): 45, ProcessCoord(pipe=2, data=3, model=2): 46, ProcessCoord(pipe=2, data=3, model=3): 47, ProcessCoord(pipe=3, data=0, model=0): 48, ProcessCoord(pipe=3, data=0, model=1): 49, ProcessCoord(pipe=3, data=0, model=2): 50, ProcessCoord(pipe=3, data=0, model=3): 51, ProcessCoord(pipe=3, data=1, model=0): 52, ProcessCoord(pipe=3, data=1, model=1): 53, ProcessCoord(pipe=3, data=1, model=2): 54, ProcessCoord(pipe=3, data=1, model=3): 55, ProcessCoord(pipe=3, data=2, model=0): 56, ProcessCoord(pipe=3, data=2, model=1): 57, ProcessCoord(pipe=3, data=2, model=2): 58, ProcessCoord(pipe=3, data=2, model=3): 59, ProcessCoord(pipe=3, data=3, model=0): 60, ProcessCoord(pipe=3, data=3, model=1): 61, ProcessCoord(pipe=3, data=3, model=2): 62, ProcessCoord(pipe=3, data=3, model=3): 63}
[2021-10-04 10:34:52,631] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer
stage=0 layers=9
     0: _to_float16
     1: EmbeddingPipe
     2: <lambda>
     3: ParallelTransformerLayerPipe
     4: ParallelTransformerLayerPipe
     5: ParallelTransformerLayerPipe
     6: ParallelTransformerLayerPipe
     7: ParallelTransformerLayerPipe
     8: ParallelTransformerLayerPipe
stage=1 layers=6
     9: ParallelTransformerLayerPipe
    10: ParallelTransformerLayerPipe
    11: ParallelTransformerLayerPipe
    12: ParallelTransformerLayerPipe
    13: ParallelTransformerLayerPipe
    14: ParallelTransformerLayerPipe
stage=2 layers=6
    15: ParallelTransformerLayerPipe
    16: ParallelTransformerLayerPipe
    17: ParallelTransformerLayerPipe
    18: ParallelTransformerLayerPipe
    19: ParallelTransformerLayerPipe
    20: ParallelTransformerLayerPipe
stage=3 layers=10
    21: ParallelTransformerLayerPipe
    22: ParallelTransformerLayerPipe
    23: ParallelTransformerLayerPipe
    24: ParallelTransformerLayerPipe
    25: ParallelTransformerLayerPipe
    26: ParallelTransformerLayerPipe
    27: <lambda>
    28: MixedFusedLayerNorm
    29: EmbeddingPipe
    30: float16_to_fp32
  loss: CrossEntropy
 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 101549056
 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 101549056
 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 101549056
 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 101544960
 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 101549056
 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 101544960
 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 101544960
[2021-10-04 10:34:52,956] [INFO] [utils.py:680:see_memory_usage] After Building Model
[2021-10-04 10:34:52,957] [INFO] [utils.py:681:see_memory_usage] MA 0.2 GB         Max_MA 0.22 GB         CA 0.24 GB         Max_CA 0 GB 
[2021-10-04 10:34:52,958] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory:  used = 41.02 GB, percent = 21.9%
 > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 101544960
setting training iterations to 152972
> learning rate decay style: cosine
DeepSpeed is enabled.
[2021-10-04 10:34:52,978] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+72ce55a, git-hash=72ce55a, git-branch=big-science
[2021-10-04 10:34:53,045] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False
[2021-10-04 10:34:53,046] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer
[2021-10-04 10:34:53,046] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer
[2021-10-04 10:34:53,046] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam
[2021-10-04 10:34:53,046] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type=<class 'apex.optimizers.fused_adam.FusedAdam'>
[2021-10-04 10:34:53,046] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer
[2021-10-04 10:34:53,046] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000
[2021-10-04 10:34:53,046] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000
[2021-10-04 10:34:53,046] [INFO] [stage2.py:108:__init__] CPU Offload: False
[2021-10-04 10:34:53,046] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False
[2021-10-04 10:34:53,281] [INFO] [stage2.py:419:__init__] optimizer state initialized
[2021-10-04 10:34:53,281] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam
[2021-10-04 10:34:53,281] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler
[2021-10-04 10:34:53,281] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = <megatron.learning_rates.AnnealingLR object at 0x154b4e6a1c70>
[2021-10-04 10:34:53,281] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)]
[2021-10-04 10:34:53,281] [INFO] [config.py:900:print] DeepSpeedEngine configuration:
[2021-10-04 10:34:53,282] [INFO] [config.py:904:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2021-10-04 10:34:53,282] [INFO] [config.py:904:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2021-10-04 10:34:53,282] [INFO] [config.py:904:print]   allreduce_always_fp32 ........ False
[2021-10-04 10:34:53,282] [INFO] [config.py:904:print]   amp_enabled .................. False
[2021-10-04 10:34:53,282] [INFO] [config.py:904:print]   amp_params ................... False
[2021-10-04 10:34:53,282] [INFO] [config.py:904:print]   checkpoint_tag_validation_enabled  True
[2021-10-04 10:34:53,282] [INFO] [config.py:904:print]   checkpoint_tag_validation_fail  False
[2021-10-04 10:34:53,282] [INFO] [config.py:904:print]   disable_allgather ............ False
[2021-10-04 10:34:53,282] [INFO] [config.py:904:print]   dump_state ................... False
[2021-10-04 10:34:53,282] [INFO] [config.py:904:print]   dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1}
[2021-10-04 10:34:53,282] [INFO] [config.py:904:print]   eigenvalue_enabled ........... False
[2021-10-04 10:34:53,282] [INFO] [config.py:904:print]   eigenvalue_gas_boundary_resolution  1
[2021-10-04 10:34:53,282] [INFO] [config.py:904:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2021-10-04 10:34:53,282] [INFO] [config.py:904:print]   eigenvalue_layer_num ......... 0
[2021-10-04 10:34:53,282] [INFO] [config.py:904:print]   eigenvalue_max_iter .......... 100
[2021-10-04 10:34:53,282] [INFO] [config.py:904:print]   eigenvalue_stability ......... 1e-06
[2021-10-04 10:34:53,282] [INFO] [config.py:904:print]   eigenvalue_tol ............... 0.01
[2021-10-04 10:34:53,282] [INFO] [config.py:904:print]   eigenvalue_verbose ........... False
[2021-10-04 10:34:53,282] [INFO] [config.py:904:print]   elasticity_enabled ........... False
[2021-10-04 10:34:53,282] [INFO] [config.py:904:print]   flops_profiler_config ........ {
    "enabled": false, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2021-10-04 10:34:53,282] [INFO] [config.py:904:print]   fp16_enabled ................. True
[2021-10-04 10:34:53,282] [INFO] [config.py:904:print]   fp16_mixed_quantize .......... False
[2021-10-04 10:34:53,282] [INFO] [config.py:904:print]   global_rank .................. 0
[2021-10-04 10:34:53,282] [INFO] [config.py:904:print]   gradient_accumulation_steps .. 16
[2021-10-04 10:34:53,282] [INFO] [config.py:904:print]   gradient_clipping ............ 1.0
[2021-10-04 10:34:53,282] [INFO] [config.py:904:print]   gradient_predivide_factor .... 1.0
[2021-10-04 10:34:53,282] [INFO] [config.py:904:print]   initial_dynamic_scale ........ 4096
[2021-10-04 10:34:53,282] [INFO] [config.py:904:print]   loss_scale ................... 0
[2021-10-04 10:34:53,282] [INFO] [config.py:904:print]   memory_breakdown ............. False
[2021-10-04 10:34:53,282] [INFO] [config.py:904:print]   optimizer_legacy_fusion ...... False
[2021-10-04 10:34:53,282] [INFO] [config.py:904:print]   optimizer_name ............... None
[2021-10-04 10:34:53,282] [INFO] [config.py:904:print]   optimizer_params ............. None
[2021-10-04 10:34:53,282] [INFO] [config.py:904:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2021-10-04 10:34:53,283] [INFO] [config.py:904:print]   pld_enabled .................. False
[2021-10-04 10:34:53,283] [INFO] [config.py:904:print]   pld_params ................... False
[2021-10-04 10:34:53,283] [INFO] [config.py:904:print]   prescale_gradients ........... False
[2021-10-04 10:34:53,283] [INFO] [config.py:904:print]   quantize_change_rate ......... 0.001
[2021-10-04 10:34:53,283] [INFO] [config.py:904:print]   quantize_groups .............. 1
[2021-10-04 10:34:53,283] [INFO] [config.py:904:print]   quantize_offset .............. 1000
[2021-10-04 10:34:53,283] [INFO] [config.py:904:print]   quantize_period .............. 1000
[2021-10-04 10:34:53,283] [INFO] [config.py:904:print]   quantize_rounding ............ 0
[2021-10-04 10:34:53,283] [INFO] [config.py:904:print]   quantize_start_bits .......... 16
[2021-10-04 10:34:53,283] [INFO] [config.py:904:print]   quantize_target_bits ......... 8
[2021-10-04 10:34:53,283] [INFO] [config.py:904:print]   quantize_training_enabled .... False
[2021-10-04 10:34:53,283] [INFO] [config.py:904:print]   quantize_type ................ 0
[2021-10-04 10:34:53,283] [INFO] [config.py:904:print]   quantize_verbose ............. False
[2021-10-04 10:34:53,283] [INFO] [config.py:904:print]   scheduler_name ............... None
[2021-10-04 10:34:53,283] [INFO] [config.py:904:print]   scheduler_params ............. None
[2021-10-04 10:34:53,283] [INFO] [config.py:904:print]   sparse_attention ............. None
[2021-10-04 10:34:53,283] [INFO] [config.py:904:print]   sparse_gradients_enabled ..... False
[2021-10-04 10:34:53,283] [INFO] [config.py:904:print]   steps_per_print .............. 2000
[2021-10-04 10:34:53,283] [INFO] [config.py:904:print]   tensorboard_enabled .......... False
[2021-10-04 10:34:53,283] [INFO] [config.py:904:print]   tensorboard_job_name ......... DeepSpeedJobName
[2021-10-04 10:34:53,283] [INFO] [config.py:904:print]   tensorboard_output_path ...... 
[2021-10-04 10:34:53,283] [INFO] [config.py:904:print]   train_batch_size ............. 512
[2021-10-04 10:34:53,283] [INFO] [config.py:904:print]   train_micro_batch_size_per_gpu  8
[2021-10-04 10:34:53,283] [INFO] [config.py:904:print]   use_quantizer_kernel ......... False
[2021-10-04 10:34:53,283] [INFO] [config.py:904:print]   wall_clock_breakdown ......... False
[2021-10-04 10:34:53,283] [INFO] [config.py:904:print]   world_size ................... 4
[2021-10-04 10:34:53,283] [INFO] [config.py:904:print]   zero_allow_untested_optimizer  False
[2021-10-04 10:34:53,283] [INFO] [config.py:904:print]   zero_config .................. {
    "stage": 1, 
    "contiguous_gradients": false, 
    "reduce_scatter": true, 
    "reduce_bucket_size": 5.000000e+08, 
    "allgather_partitions": true, 
    "allgather_bucket_size": 5.000000e+08, 
    "overlap_comm": false, 
    "load_from_fp32_weights": true, 
    "elastic_checkpoint": true, 
    "offload_param": null, 
    "offload_optimizer": null, 
    "sub_group_size": 1.000000e+09, 
    "prefetch_bucket_size": 5.000000e+07, 
    "param_persistence_threshold": 1.000000e+05, 
    "max_live_parameters": 1.000000e+09, 
    "max_reuse_distance": 1.000000e+09, 
    "gather_fp16_weights_on_model_save": false, 
    "ignore_unused_parameters": true, 
    "round_robin_gradients": false, 
    "legacy_stage1": false
}
[2021-10-04 10:34:53,283] [INFO] [config.py:904:print]   zero_enabled ................. True
[2021-10-04 10:34:53,283] [INFO] [config.py:904:print]   zero_optimization_stage ...... 1
[2021-10-04 10:34:53,283] [INFO] [config.py:906:print]   json = {
    "train_micro_batch_size_per_gpu": 8, 
    "train_batch_size": 512, 
    "gradient_clipping": 1.0, 
    "zero_optimization": {
        "stage": 1
    }, 
    "fp16": {
        "enabled": true, 
        "loss_scale": 0, 
        "loss_scale_window": 500, 
        "hysteresis": 2, 
        "min_loss_scale": 1, 
        "initial_scale_power": 12
    }, 
    "steps_per_print": 2.000000e+03, 
    "wall_clock_breakdown": false
}
[2021-10-04 10:34:53,284] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=16 micro_batch_size=8
[2021-10-04 10:34:53,574] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-04 10:34:53,574] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-04 10:34:53,574] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-04 10:34:53,574] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-04 10:34:53,574] [INFO] [engine.py:134:__init__] RANK=32 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-04 10:34:53,574] [INFO] [engine.py:134:__init__] RANK=33 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-04 10:34:53,574] [INFO] [engine.py:134:__init__] RANK=35 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-04 10:34:53,574] [INFO] [engine.py:134:__init__] RANK=34 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-04 10:34:53,574] [INFO] [engine.py:134:__init__] RANK=17 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-04 10:34:53,574] [INFO] [engine.py:134:__init__] RANK=16 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-04 10:34:53,574] [INFO] [engine.py:134:__init__] RANK=18 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-04 10:34:53,574] [INFO] [engine.py:134:__init__] RANK=19 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-04 10:34:53,574] [INFO] [engine.py:134:__init__] RANK=48 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-04 10:34:53,574] [INFO] [engine.py:134:__init__] RANK=50 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-04 10:34:53,574] [INFO] [engine.py:134:__init__] RANK=49 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-04 10:34:53,574] [INFO] [engine.py:134:__init__] RANK=51 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
 > using checkpoint value 0.0002 for learning rate
 > using checkpoint value 1e-05 for minimum learning rate
 > using checkpoint value 183105 for warmup iterations
 > using checkpoint value 73242187 for total number of iterations
 > using checkpoint value cosine for decay style
successfully loaded 4 ZeRO state_dicts for rank 44
successfully loaded 4 ZeRO state_dicts for rank 22
successfully loaded 4 ZeRO state_dicts for rank 39
successfully loaded 4 ZeRO state_dicts for rank 9
successfully loaded 4 ZeRO state_dicts for rank 37
successfully loaded 4 ZeRO state_dicts for rank 29
successfully loaded 4 ZeRO state_dicts for rank 25
successfully loaded 4 ZeRO state_dicts for rank 5
successfully loaded 4 ZeRO state_dicts for rank 35
successfully loaded 4 ZeRO state_dicts for rank 28
successfully loaded 4 ZeRO state_dicts for rank 7
successfully loaded 4 ZeRO state_dicts for rank 24
successfully loaded 4 ZeRO state_dicts for rank 17
successfully loaded 4 ZeRO state_dicts for rank 16
successfully loaded 4 ZeRO state_dicts for rank 8
successfully loaded 4 ZeRO state_dicts for rank 40
successfully loaded 4 ZeRO state_dicts for rank 38
successfully loaded 4 ZeRO state_dicts for rank 43
successfully loaded 4 ZeRO state_dicts for rank 47
successfully loaded 4 ZeRO state_dicts for rank 42
successfully loaded 4 ZeRO state_dicts for rank 20
successfully loaded 4 ZeRO state_dicts for rank 46
successfully loaded 4 ZeRO state_dicts for rank 33
successfully loaded 4 ZeRO state_dicts for rank 34
successfully loaded 4 ZeRO state_dicts for rank 45
successfully loaded 4 ZeRO state_dicts for rank 54
successfully loaded 4 ZeRO state_dicts for rank 32
successfully loaded 4 ZeRO state_dicts for rank 36
successfully loaded 4 ZeRO state_dicts for rank 41
successfully loaded 4 ZeRO state_dicts for rank 21
successfully loaded 4 ZeRO state_dicts for rank 62
successfully loaded 4 ZeRO state_dicts for rank 58
successfully loaded 4 ZeRO state_dicts for rank 48
successfully loaded 4 ZeRO state_dicts for rank 18
successfully loaded 4 ZeRO state_dicts for rank 61
successfully loaded 4 ZeRO state_dicts for rank 2
successfully loaded 4 ZeRO state_dicts for rank 53
successfully loaded 4 ZeRO state_dicts for rank 30
successfully loaded 4 ZeRO state_dicts for rank 57
successfully loaded 4 ZeRO state_dicts for rank 49
successfully loaded 4 ZeRO state_dicts for rank 52
successfully loaded 4 ZeRO state_dicts for rank 26
successfully loaded 4 ZeRO state_dicts for rank 14
successfully loaded 4 ZeRO state_dicts for rank 27
successfully loaded 4 ZeRO state_dicts for rank 19
successfully loaded 4 ZeRO state_dicts for rank 56
successfully loaded 4 ZeRO state_dicts for rank 13
successfully loaded 4 ZeRO state_dicts for rank 50
successfully loaded 4 ZeRO state_dicts for rank 10
successfully loaded 4 ZeRO state_dicts for rank 63
successfully loaded 4 ZeRO state_dicts for rank 1
successfully loaded 4 ZeRO state_dicts for rank 23
successfully loaded 4 ZeRO state_dicts for rank 6
successfully loaded 4 ZeRO state_dicts for rank 12
successfully loaded 4 ZeRO state_dicts for rank 15
successfully loaded 4 ZeRO state_dicts for rank 0
successfully loaded 4 ZeRO state_dicts for rank 4
successfully loaded 4 ZeRO state_dicts for rank 31
successfully loaded 4 ZeRO state_dicts for rank 3
loading 4 zero partition checkpoints for rank 44
successfully loaded 4 ZeRO state_dicts for rank 11
successfully loaded 4 ZeRO state_dicts for rank 60
successfully loaded 4 ZeRO state_dicts for rank 55
loading 4 zero partition checkpoints for rank 22
loading 4 zero partition checkpoints for rank 39
successfully loaded 4 ZeRO state_dicts for rank 59
loading 4 zero partition checkpoints for rank 37
successfully loaded 4 ZeRO state_dicts for rank 51
loading 4 zero partition checkpoints for rank 35
loading 4 zero partition checkpoints for rank 29
loading 4 zero partition checkpoints for rank 25
loading 4 zero partition checkpoints for rank 28
loading 4 zero partition checkpoints for rank 24
loading 4 zero partition checkpoints for rank 9
loading 4 zero partition checkpoints for rank 17
loading 4 zero partition checkpoints for rank 46
loading 4 zero partition checkpoints for rank 16
loading 4 zero partition checkpoints for rank 38
loading 4 zero partition checkpoints for rank 40
loading 4 zero partition checkpoints for rank 36
loading 4 zero partition checkpoints for rank 45
loading 4 zero partition checkpoints for rank 20
loading 4 zero partition checkpoints for rank 47
loading 4 zero partition checkpoints for rank 43
loading 4 zero partition checkpoints for rank 42
loading 4 zero partition checkpoints for rank 34
loading 4 zero partition checkpoints for rank 33
loading 4 zero partition checkpoints for rank 32
loading 4 zero partition checkpoints for rank 41
loading 4 zero partition checkpoints for rank 18
loading 4 zero partition checkpoints for rank 21
loading 4 zero partition checkpoints for rank 5
loading 4 zero partition checkpoints for rank 30
loading 4 zero partition checkpoints for rank 26
loading 4 zero partition checkpoints for rank 27
loading 4 zero partition checkpoints for rank 7
loading 4 zero partition checkpoints for rank 8
loading 4 zero partition checkpoints for rank 19
loading 4 zero partition checkpoints for rank 23
loading 4 zero partition checkpoints for rank 31
loading 4 zero partition checkpoints for rank 54
loading 4 zero partition checkpoints for rank 62
loading 4 zero partition checkpoints for rank 61
loading 4 zero partition checkpoints for rank 58
loading 4 zero partition checkpoints for rank 2
loading 4 zero partition checkpoints for rank 48
loading 4 zero partition checkpoints for rank 53
loading 4 zero partition checkpoints for rank 57
loading 4 zero partition checkpoints for rank 49
loading 4 zero partition checkpoints for rank 52
loading 4 zero partition checkpoints for rank 56
loading 4 zero partition checkpoints for rank 14
loading 4 zero partition checkpoints for rank 10
loading 4 zero partition checkpoints for rank 4
loading 4 zero partition checkpoints for rank 12
loading 4 zero partition checkpoints for rank 6
loading 4 zero partition checkpoints for rank 60
loading 4 zero partition checkpoints for rank 50
loading 4 zero partition checkpoints for rank 63
loading 4 zero partition checkpoints for rank 13
loading 4 zero partition checkpoints for rank 11
loading 4 zero partition checkpoints for rank 15
loading 4 zero partition checkpoints for rank 0
 checkpoint version 3.0
loading 4 zero partition checkpoints for rank 55
loading 4 zero partition checkpoints for rank 1
loading 4 zero partition checkpoints for rank 3
loading 4 zero partition checkpoints for rank 51
loading 4 zero partition checkpoints for rank 59
  successfully loaded checkpoint from /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints at iteration 99586
time (ms) | load-checkpoint: 1979.53
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.62471936
estimated model parameters: 1.62471936
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.62471936estimated model parameters: 1.62471936

estimated model parameters: 1.209483264estimated model parameters: 1.209483264

estimated model parameters: 1.62471936
estimated model parameters: 1.209483264
estimated model parameters: 1.624784896
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.62471936
estimated model parameters: 1.62471936
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.62471936
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.624784896
estimated model parameters: 1.62471936
estimated model parameters: 1.209483264estimated model parameters: 1.209483264estimated model parameters: 1.209483264

estimated model parameters: 1.62471936
estimated model parameters: 1.624784896estimated model parameters: 1.624784896

estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.624784896
estimated model parameters: 1.62471936

estimated model parameters: 1.209483264
estimated model parameters: 1.62471936
estimated model parameters: 1.624784896estimated model parameters: 1.624784896

estimated model parameters: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.624784896
estimated model parameters: 1.62471936estimated model parameters: 1.62471936
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264estimated model parameters: 1.209483264

estimated model parameters: 1.62471936estimated model parameters: 1.62471936


estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896estimated model parameters: 1.624784896estimated model parameters: 1.624784896


estimated model parameters: 1.624784896
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488

estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488

estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
[after model, optimizer, and learning rate scheduler are built] datetime: 2021-10-04 10:34:55 
> building train, validation, and test datasets ...
 > datasets target sizes (minimum size):
    train:      73242187
    validation: 7833600
    test:       51200
> building train, validation, and test datasets for GPT ...
 > building dataset index ...
    reading sizes...
    reading pointers...
    reading document index...
    creating numpy buffer of mmap...
    creating memory view of numpy buffer...
 > finished creating indexed dataset in 5.231784 seconds
    number of documents: 304230423
 > dataset split:
    train:
     document indices in [0, 288714672) total of 288714672 documents
    validation:
     document indices in [288714672, 303926193) total of 15211521 documents
    test:
     document indices in [303926193, 304230423) total of 304230 documents
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.310 seconds
    total number of samples: 131537224
    total number of epochs: 1
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.212 seconds
    total number of samples: 13854322
    total number of epochs: 2
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.040 seconds
    total number of samples: 137384
    total number of epochs: 1
> finished creating GPT datasets ...
[after dataloaders are built] datetime: 2021-10-04 10:35:05 
done with setup ...
training ...
time (ms) | model-and-optimizer-setup: 3647.99 | train/valid/test-data-iterators-setup: 9556.77
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion

Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billionNumber of parameters: 1.62471936 billion

Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billion

Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion

Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion

Number of parameters: 1.62471936 billionNumber of parameters: 1.62471936 billion

Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion

Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion

Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.624784896 billion
Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion

Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion

Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion

Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
[before the start of training step] datetime: 2021-10-04 10:35:05 
[2021-10-04 10:35:06,003] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information
[2021-10-04 10:35:06,003] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False
[2021-10-04 10:35:06,003] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 24 total layers
[2021-10-04 10:35:06,003] [INFO] [checkpointing.py:415:forward] ----Synchronization False
[2021-10-04 10:35:06,003] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False
[Rank 48] (after 99600 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 6694.0 | max reserved: 6694.0
[Rank 51] (after 99600 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7230.0 | max reserved: 7230.0
[Rank 33] (after 99600 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4220.0 | max reserved: 4220.0
[Rank 17] (after 99600 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4620.0 | max reserved: 4620.0
[Rank 49] (after 99600 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 6518.0 | max reserved: 6518.0
[Rank 1] (after 99600 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.45751953125 | reserved: 5494.0 | max reserved: 5494.0
 iteration    99600/  152972 | consumed samples:     45915584 | elapsed time per iteration (ms): 6796.4 | learning rate: 6.877E-05 | global batch size:   512 | lm loss: 2.799741E+00 | loss scale: 131072.0 | grad norm: 10223.021 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[Rank 35] (after 99600 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4348.0 | max reserved: 4348.0
[Rank 3] (after 99600 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.45751953125 | reserved: 5574.0 | max reserved: 5574.0
[Rank 19] (after 99600 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4796.0 | max reserved: 4796.0
[Rank 2] (after 99600 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.45751953125 | reserved: 5318.0 | max reserved: 5318.0
[Rank 18] (after 99600 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4716.0 | max reserved: 4716.0
[Rank 34] (after 99600 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4332.0 | max reserved: 4332.0
[Rank 50] (after 99600 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 6534.0 | max reserved: 6534.0
[Rank 32] (after 99600 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4380.0 | max reserved: 4380.0
[Rank 16] (after 99600 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4780.0 | max reserved: 4780.0
[Rank 0] (after 99600 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.95751953125 | reserved: 5430.0 | max reserved: 5430.0
 iteration    99800/  152972 | consumed samples:     46017984 | elapsed time per iteration (ms): 5931.6 | learning rate: 6.838E-05 | global batch size:   512 | lm loss: 2.790099E+00 | loss scale: 131072.0 | grad norm: 10895.414 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-10-04 11:16:15,307] [INFO] [logging.py:68:log_dist] [Rank 0] step=100000, skipped=220, lr=[6.799779725317993e-05, 6.799779725317993e-05], mom=[(0.9, 0.999), (0.9, 0.999)]
 iteration   100000/  152972 | consumed samples:     46120384 | elapsed time per iteration (ms): 5939.5 | learning rate: 6.800E-05 | global batch size:   512 | lm loss: 2.789123E+00 | loss scale: 131072.0 | grad norm: 11336.207 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
steps: 100000 loss: 2.8137 iter time (s): 0.003 samples/sec: 172825.796
--------------------------------------------------------------------------------------------------
 validation loss at iteration 100000 | lm loss value: 2.738130E+00 | lm loss PPL: 1.545805E+01 | 
--------------------------------------------------------------------------------------------------
 iteration   100200/  152972 | consumed samples:     46222784 | elapsed time per iteration (ms): 6843.4 | learning rate: 6.761E-05 | global batch size:   512 | lm loss: 2.789294E+00 | loss scale: 262144.0 | grad norm: 24014.322 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   100400/  152972 | consumed samples:     46325184 | elapsed time per iteration (ms): 5926.8 | learning rate: 6.723E-05 | global batch size:   512 | lm loss: 2.791195E+00 | loss scale: 262144.0 | grad norm: 24384.582 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration  100500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-04 12:08:44,662] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step100500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration  100500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1688.16
 iteration   100600/  152972 | consumed samples:     46427584 | elapsed time per iteration (ms): 5947.2 | learning rate: 6.685E-05 | global batch size:   512 | lm loss: 2.791104E+00 | loss scale: 524288.0 | grad norm: 49476.752 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   100800/  152972 | consumed samples:     46529984 | elapsed time per iteration (ms): 5919.1 | learning rate: 6.646E-05 | global batch size:   512 | lm loss: 2.790840E+00 | loss scale: 524288.0 | grad norm: 48632.619 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   101000/  152972 | consumed samples:     46632384 | elapsed time per iteration (ms): 5919.8 | learning rate: 6.608E-05 | global batch size:   512 | lm loss: 2.791100E+00 | loss scale: 524288.0 | grad norm: 52437.837 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
--------------------------------------------------------------------------------------------------
 validation loss at iteration 101000 | lm loss value: 2.741456E+00 | lm loss PPL: 1.550954E+01 | 
--------------------------------------------------------------------------------------------------
 iteration   101200/  152972 | consumed samples:     46734784 | elapsed time per iteration (ms): 6787.9 | learning rate: 6.571E-05 | global batch size:   512 | lm loss: 2.793956E+00 | loss scale: 524288.0 | grad norm: 50857.740 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   101400/  152972 | consumed samples:     46837184 | elapsed time per iteration (ms): 5912.4 | learning rate: 6.533E-05 | global batch size:   512 | lm loss: 2.793494E+00 | loss scale: 131072.0 | grad norm: 13392.593 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   101600/  152972 | consumed samples:     46939584 | elapsed time per iteration (ms): 5922.2 | learning rate: 6.495E-05 | global batch size:   512 | lm loss: 2.790947E+00 | loss scale: 131072.0 | grad norm: 12578.007 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   101800/  152972 | consumed samples:     47041984 | elapsed time per iteration (ms): 5926.6 | learning rate: 6.457E-05 | global batch size:   512 | lm loss: 2.798193E+00 | loss scale: 65536.0 | grad norm: 6465.533 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-10-04 14:39:37,616] [INFO] [logging.py:68:log_dist] [Rank 0] step=102000, skipped=226, lr=[6.419348005006784e-05, 6.419348005006784e-05], mom=[(0.9, 0.999), (0.9, 0.999)]
 iteration   102000/  152972 | consumed samples:     47144384 | elapsed time per iteration (ms): 5906.1 | learning rate: 6.419E-05 | global batch size:   512 | lm loss: 2.797245E+00 | loss scale: 65536.0 | grad norm: 6424.917 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
steps: 102000 loss: 2.7841 iter time (s): 0.003 samples/sec: 173427.068
--------------------------------------------------------------------------------------------------
 validation loss at iteration 102000 | lm loss value: 2.740548E+00 | lm loss PPL: 1.549548E+01 | 
--------------------------------------------------------------------------------------------------
saving checkpoint at iteration  102000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-04 14:42:30,048] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step102000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration  102000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1671.66
 iteration   102200/  152972 | consumed samples:     47246784 | elapsed time per iteration (ms): 6789.1 | learning rate: 6.382E-05 | global batch size:   512 | lm loss: 2.794001E+00 | loss scale: 65536.0 | grad norm: 6316.550 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   102400/  152972 | consumed samples:     47349184 | elapsed time per iteration (ms): 5916.4 | learning rate: 6.344E-05 | global batch size:   512 | lm loss: 2.792696E+00 | loss scale: 131072.0 | grad norm: 12878.404 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   102600/  152972 | consumed samples:     47451584 | elapsed time per iteration (ms): 5917.7 | learning rate: 6.306E-05 | global batch size:   512 | lm loss: 2.791871E+00 | loss scale: 131072.0 | grad norm: 13187.832 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   102800/  152972 | consumed samples:     47553984 | elapsed time per iteration (ms): 5913.3 | learning rate: 6.269E-05 | global batch size:   512 | lm loss: 2.794280E+00 | loss scale: 262144.0 | grad norm: 25379.547 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   103000/  152972 | consumed samples:     47656384 | elapsed time per iteration (ms): 5920.7 | learning rate: 6.231E-05 | global batch size:   512 | lm loss: 2.796042E+00 | loss scale: 262144.0 | grad norm: 23912.357 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
--------------------------------------------------------------------------------------------------
 validation loss at iteration 103000 | lm loss value: 2.741002E+00 | lm loss PPL: 1.550250E+01 | 
--------------------------------------------------------------------------------------------------
 iteration   103200/  152972 | consumed samples:     47758784 | elapsed time per iteration (ms): 6812.4 | learning rate: 6.194E-05 | global batch size:   512 | lm loss: 2.794200E+00 | loss scale: 262144.0 | grad norm: 26086.640 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   103400/  152972 | consumed samples:     47861184 | elapsed time per iteration (ms): 5922.9 | learning rate: 6.157E-05 | global batch size:   512 | lm loss: 2.794377E+00 | loss scale: 524288.0 | grad norm: 49300.886 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration  103500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-04 17:13:29,616] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step103500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration  103500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1715.68
 iteration   103600/  152972 | consumed samples:     47963584 | elapsed time per iteration (ms): 5939.3 | learning rate: 6.120E-05 | global batch size:   512 | lm loss: 2.790358E+00 | loss scale: 262144.0 | grad norm: 24830.882 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   103800/  152972 | consumed samples:     48065984 | elapsed time per iteration (ms): 5935.8 | learning rate: 6.083E-05 | global batch size:   512 | lm loss: 2.788473E+00 | loss scale: 262144.0 | grad norm: 26025.663 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-10-04 18:02:56,776] [INFO] [logging.py:68:log_dist] [Rank 0] step=104000, skipped=228, lr=[6.046040529407516e-05, 6.046040529407516e-05], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 104000 loss: 2.7937 iter time (s): 0.003 samples/sec: 172550.370
 iteration   104000/  152972 | consumed samples:     48168384 | elapsed time per iteration (ms): 5928.3 | learning rate: 6.046E-05 | global batch size:   512 | lm loss: 2.792356E+00 | loss scale: 524288.0 | grad norm: 49369.113 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
--------------------------------------------------------------------------------------------------
 validation loss at iteration 104000 | lm loss value: 2.741573E+00 | lm loss PPL: 1.551137E+01 | 
--------------------------------------------------------------------------------------------------
 iteration   104200/  152972 | consumed samples:     48270784 | elapsed time per iteration (ms): 6801.2 | learning rate: 6.009E-05 | global batch size:   512 | lm loss: 2.793087E+00 | loss scale: 524288.0 | grad norm: 49662.088 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   104400/  152972 | consumed samples:     48373184 | elapsed time per iteration (ms): 5942.3 | learning rate: 5.973E-05 | global batch size:   512 | lm loss: 2.793109E+00 | loss scale: 524288.0 | grad norm: 60846.858 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   104600/  152972 | consumed samples:     48475584 | elapsed time per iteration (ms): 5929.0 | learning rate: 5.936E-05 | global batch size:   512 | lm loss: 2.794067E+00 | loss scale: 1048576.0 | grad norm: 101329.807 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   104800/  152972 | consumed samples:     48577984 | elapsed time per iteration (ms): 5931.0 | learning rate: 5.900E-05 | global batch size:   512 | lm loss: 2.794277E+00 | loss scale: 262144.0 | grad norm: 27798.178 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   105000/  152972 | consumed samples:     48680384 | elapsed time per iteration (ms): 5925.1 | learning rate: 5.863E-05 | global batch size:   512 | lm loss: 2.789887E+00 | loss scale: 131072.0 | grad norm: 12532.068 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
--------------------------------------------------------------------------------------------------
 validation loss at iteration 105000 | lm loss value: 2.737305E+00 | lm loss PPL: 1.544530E+01 | 
--------------------------------------------------------------------------------------------------
saving checkpoint at iteration  105000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-04 19:47:32,369] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step105000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration  105000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1573.31
 iteration   105200/  152972 | consumed samples:     48782784 | elapsed time per iteration (ms): 6781.9 | learning rate: 5.827E-05 | global batch size:   512 | lm loss: 2.793075E+00 | loss scale: 131072.0 | grad norm: 12201.456 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   105400/  152972 | consumed samples:     48885184 | elapsed time per iteration (ms): 5929.1 | learning rate: 5.790E-05 | global batch size:   512 | lm loss: 2.786700E+00 | loss scale: 131072.0 | grad norm: 12600.596 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   105600/  152972 | consumed samples:     48987584 | elapsed time per iteration (ms): 5930.4 | learning rate: 5.754E-05 | global batch size:   512 | lm loss: 2.788983E+00 | loss scale: 262144.0 | grad norm: 27830.277 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   105800/  152972 | consumed samples:     49089984 | elapsed time per iteration (ms): 5932.0 | learning rate: 5.718E-05 | global batch size:   512 | lm loss: 2.791403E+00 | loss scale: 131072.0 | grad norm: 13501.556 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-10-04 21:26:23,802] [INFO] [logging.py:68:log_dist] [Rank 0] step=106000, skipped=235, lr=[5.682251394039283e-05, 5.682251394039283e-05], mom=[(0.9, 0.999), (0.9, 0.999)]
 iteration   106000/  152972 | consumed samples:     49192384 | elapsed time per iteration (ms): 5933.1 | learning rate: 5.682E-05 | global batch size:   512 | lm loss: 2.789714E+00 | loss scale: 131072.0 | grad norm: 12612.266 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
steps: 106000 loss: 2.8342 iter time (s): 0.003 samples/sec: 172935.911
--------------------------------------------------------------------------------------------------
 validation loss at iteration 106000 | lm loss value: 2.737068E+00 | lm loss PPL: 1.544165E+01 | 
--------------------------------------------------------------------------------------------------
 iteration   106200/  152972 | consumed samples:     49294784 | elapsed time per iteration (ms): 6814.9 | learning rate: 5.646E-05 | global batch size:   512 | lm loss: 2.787916E+00 | loss scale: 131072.0 | grad norm: 12850.140 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   106400/  152972 | consumed samples:     49397184 | elapsed time per iteration (ms): 5929.5 | learning rate: 5.610E-05 | global batch size:   512 | lm loss: 2.787785E+00 | loss scale: 262144.0 | grad norm: 29147.177 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration  106500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-04 22:18:46,270] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step106500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration  106500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1451.59
 iteration   106600/  152972 | consumed samples:     49499584 | elapsed time per iteration (ms): 5942.5 | learning rate: 5.575E-05 | global batch size:   512 | lm loss: 2.786064E+00 | loss scale: 262144.0 | grad norm: 25576.485 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   106800/  152972 | consumed samples:     49601984 | elapsed time per iteration (ms): 5929.3 | learning rate: 5.539E-05 | global batch size:   512 | lm loss: 2.790410E+00 | loss scale: 524288.0 | grad norm: 47572.079 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   107000/  152972 | consumed samples:     49704384 | elapsed time per iteration (ms): 5931.4 | learning rate: 5.503E-05 | global batch size:   512 | lm loss: 2.787986E+00 | loss scale: 524288.0 | grad norm: 55311.282 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
--------------------------------------------------------------------------------------------------
 validation loss at iteration 107000 | lm loss value: 2.732661E+00 | lm loss PPL: 1.537374E+01 | 
--------------------------------------------------------------------------------------------------
 iteration   107200/  152972 | consumed samples:     49806784 | elapsed time per iteration (ms): 6814.6 | learning rate: 5.468E-05 | global batch size:   512 | lm loss: 2.786071E+00 | loss scale: 262144.0 | grad norm: 25799.595 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   107400/  152972 | consumed samples:     49909184 | elapsed time per iteration (ms): 5923.9 | learning rate: 5.433E-05 | global batch size:   512 | lm loss: 2.787072E+00 | loss scale: 262144.0 | grad norm: 26654.252 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   107600/  152972 | consumed samples:     50011584 | elapsed time per iteration (ms): 5935.6 | learning rate: 5.397E-05 | global batch size:   512 | lm loss: 2.784829E+00 | loss scale: 262144.0 | grad norm: 24746.988 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   107800/  152972 | consumed samples:     50113984 | elapsed time per iteration (ms): 5945.1 | learning rate: 5.362E-05 | global batch size:   512 | lm loss: 2.784871E+00 | loss scale: 262144.0 | grad norm: 26067.145 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-10-05 00:50:03,797] [INFO] [logging.py:68:log_dist] [Rank 0] step=108000, skipped=240, lr=[5.327385668917195e-05, 5.327385668917195e-05], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 108000 loss: 2.7790 iter time (s): 0.003 samples/sec: 172840.721
 iteration   108000/  152972 | consumed samples:     50216384 | elapsed time per iteration (ms): 5933.3 | learning rate: 5.327E-05 | global batch size:   512 | lm loss: 2.785659E+00 | loss scale: 131072.0 | grad norm: 12443.667 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
--------------------------------------------------------------------------------------------------
 validation loss at iteration 108000 | lm loss value: 2.735413E+00 | lm loss PPL: 1.541610E+01 | 
--------------------------------------------------------------------------------------------------
saving checkpoint at iteration  108000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-05 00:52:57,294] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step108000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration  108000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1523.78
 iteration   108200/  152972 | consumed samples:     50318784 | elapsed time per iteration (ms): 6797.4 | learning rate: 5.292E-05 | global batch size:   512 | lm loss: 2.785091E+00 | loss scale: 131072.0 | grad norm: 13440.654 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   108400/  152972 | consumed samples:     50421184 | elapsed time per iteration (ms): 5927.1 | learning rate: 5.257E-05 | global batch size:   512 | lm loss: 2.783074E+00 | loss scale: 262144.0 | grad norm: 24041.914 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   108600/  152972 | consumed samples:     50523584 | elapsed time per iteration (ms): 5933.5 | learning rate: 5.223E-05 | global batch size:   512 | lm loss: 2.781478E+00 | loss scale: 262144.0 | grad norm: 24900.686 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   108800/  152972 | consumed samples:     50625984 | elapsed time per iteration (ms): 5930.8 | learning rate: 5.188E-05 | global batch size:   512 | lm loss: 2.786335E+00 | loss scale: 262144.0 | grad norm: 26931.398 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   109000/  152972 | consumed samples:     50728384 | elapsed time per iteration (ms): 5927.2 | learning rate: 5.153E-05 | global batch size:   512 | lm loss: 2.781120E+00 | loss scale: 524288.0 | grad norm: 50342.530 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
--------------------------------------------------------------------------------------------------
 validation loss at iteration 109000 | lm loss value: 2.732274E+00 | lm loss PPL: 1.536779E+01 | 
--------------------------------------------------------------------------------------------------
 iteration   109200/  152972 | consumed samples:     50830784 | elapsed time per iteration (ms): 6791.0 | learning rate: 5.119E-05 | global batch size:   512 | lm loss: 2.784352E+00 | loss scale: 262144.0 | grad norm: 24180.032 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   109400/  152972 | consumed samples:     50933184 | elapsed time per iteration (ms): 5932.2 | learning rate: 5.085E-05 | global batch size:   512 | lm loss: 2.783908E+00 | loss scale: 262144.0 | grad norm: 25218.817 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration  109500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-05 03:24:05,071] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step109500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration  109500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1575.38
 iteration   109600/  152972 | consumed samples:     51035584 | elapsed time per iteration (ms): 5942.8 | learning rate: 5.050E-05 | global batch size:   512 | lm loss: 2.783445E+00 | loss scale: 262144.0 | grad norm: 25448.012 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   109800/  152972 | consumed samples:     51137984 | elapsed time per iteration (ms): 5926.4 | learning rate: 5.016E-05 | global batch size:   512 | lm loss: 2.784358E+00 | loss scale: 524288.0 | grad norm: 47834.678 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-10-05 04:13:32,378] [INFO] [logging.py:68:log_dist] [Rank 0] step=110000, skipped=242, lr=[4.9819865631476335e-05, 4.9819865631476335e-05], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 110000 loss: 2.7673 iter time (s): 0.003 samples/sec: 172747.511
 iteration   110000/  152972 | consumed samples:     51240384 | elapsed time per iteration (ms): 5934.5 | learning rate: 4.982E-05 | global batch size:   512 | lm loss: 2.781886E+00 | loss scale: 524288.0 | grad norm: 51011.958 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
--------------------------------------------------------------------------------------------------
 validation loss at iteration 110000 | lm loss value: 2.730394E+00 | lm loss PPL: 1.533892E+01 | 
--------------------------------------------------------------------------------------------------
 iteration   110200/  152972 | consumed samples:     51342784 | elapsed time per iteration (ms): 6804.5 | learning rate: 4.948E-05 | global batch size:   512 | lm loss: 2.781904E+00 | loss scale: 1048576.0 | grad norm: 123221.355 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   110400/  152972 | consumed samples:     51445184 | elapsed time per iteration (ms): 5943.6 | learning rate: 4.914E-05 | global batch size:   512 | lm loss: 2.781725E+00 | loss scale: 524288.0 | grad norm: 51531.358 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   110600/  152972 | consumed samples:     51547584 | elapsed time per iteration (ms): 5936.7 | learning rate: 4.881E-05 | global batch size:   512 | lm loss: 2.781758E+00 | loss scale: 524288.0 | grad norm: 52040.488 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   110800/  152972 | consumed samples:     51649984 | elapsed time per iteration (ms): 5947.1 | learning rate: 4.847E-05 | global batch size:   512 | lm loss: 2.783094E+00 | loss scale: 1048576.0 | grad norm: 96763.981 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   111000/  152972 | consumed samples:     51752384 | elapsed time per iteration (ms): 5941.4 | learning rate: 4.813E-05 | global batch size:   512 | lm loss: 2.782276E+00 | loss scale: 1048576.0 | grad norm: 97648.565 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
--------------------------------------------------------------------------------------------------
 validation loss at iteration 111000 | lm loss value: 2.728569E+00 | lm loss PPL: 1.531096E+01 | 
--------------------------------------------------------------------------------------------------
saving checkpoint at iteration  111000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-05 05:58:19,982] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step111000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration  111000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1497.85
 iteration   111200/  152972 | consumed samples:     51854784 | elapsed time per iteration (ms): 6799.9 | learning rate: 4.780E-05 | global batch size:   512 | lm loss: 2.780392E+00 | loss scale: 1048576.0 | grad norm: 107811.543 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration  111261 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-05 06:24:09,814] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step111261/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration  111261 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1533.56
[exiting program after 1190.0240713556607 minutes] datetime: 2021-10-05 06:24:10 
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
2021-10-05 14:10:10.734171: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.734171: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.734171: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.734174: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.743326: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.743328: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.743326: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.743330: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.763788: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.763785: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.763783: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.763787: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.765328: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.765332: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.765334: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.765324: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.768371: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.768375: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.768371: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.768379: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.769037: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.769038: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.769038: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.769040: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.774205: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.774199: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.774196: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.774209: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.792934: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.792934: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.792936: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.792945: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.794736: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.794734: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.794728: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.794737: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.804243: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.804238: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.804236: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.804241: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.806650: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.806655: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.806658: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.806660: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.812453: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.812464: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.812460: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.812473: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.813375: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.813371: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.813372: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.813370: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.827548: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.827548: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.827546: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.827547: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.834170: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.834169: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.834176: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.834175: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.839909: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.839924: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.839926: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-05 14:10:10.839923: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja----------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report--------------------------------------------------


----------------------------------------------------------------------------------------------------
JIT compiled ops requires ninja
JIT compiled ops requires ninja
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja   .................. .................. .................................... [92m[OKAY][0m  [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------
op name
 op name................op nameop name    ................installed ................installed................   installed installed .. ....  .. compatiblecompatiblecompatible 


compatible----------------------------------------------------------------------------------------------------
--------------------------------------------------


--------------------------------------------------
cpu_adamcpu_adam  cpu_adam...............cpu_adam...............    [92m[YES][0m............... ...............[92m[YES][0m[92m[YES][0m   [92m[YES][0m ..................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
......

 [92m[OKAY][0m
fused_adam fused_adam.............fused_adam   fused_adam[93m[NO][0m..........................    ....................[93m[NO][0m [93m[NO][0m[92m[OKAY][0m   
[93m[NO][0m..............  [92m[OKAY][0mfused_lamb .......
  [92m[OKAY][0m.............[92m[OKAY][0mfused_lamb
 
 .............[93m[NO][0m fused_lamb [93m[NO][0m fused_lamb....... .............  ....... .............[92m[OKAY][0m [93m[NO][0m[92m[OKAY][0m 
 
[93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attn ............ sparse_attnsparse_attn[93m[NO][0msparse_attn    ...........................................    [93m[NO][0m[93m[NO][0m[92m[OKAY][0m 
[93m[NO][0m ....... ....... transformer.......[92m[OKAY][0m   
............[92m[OKAY][0m[92m[OKAY][0m 
[93m[NO][0m
transformertransformer transformer.......  ............  ............[92m[OKAY][0m [93m[NO][0m ............
[93m[NO][0m   [93m[NO][0m.............. stochastic_transformer ....... [92m[OKAY][0m  
[92m[OKAY][0m.[92m[OKAY][0m
 stochastic_transformer
[93m[NO][0m  stochastic_transformer........stochastic_transformer    [92m[OKAY][0m.[93m[NO][0m
 . [93m[NO][0m  .......[93m[NO][0m.......   .......[92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0m
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------
JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
----------------------------------------------------------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------
DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja

--------------------------------------------------
JIT compiled ops requires ninja

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja


JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
------------------------------------------------------------------------------------------------------------------------------------------------------

JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report
------------------------------------------------------------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------


JIT compiled ops requires ninjaJIT compiled ops requires ninja
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja   .................. .................................... ..................  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


op nameop nameop nameop name    ................................................................    installedinstalledinstalledinstalled   .. ......    compatiblecompatiblecompatiblecompatible


------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------


cpu_adam cpu_adam...............  cpu_adam...............cpu_adam [92m[YES][0m  ............... [92m[YES][0m............... ......   [92m[YES][0m[92m[YES][0m[92m[OKAY][0m......  
 ............[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m

fused_adam ............. [93m[NO][0m fused_adam.......  .............[92m[OKAY][0mfused_adamfused_adam
ninjaninjaninjaninja   .................................... ..................  .................. [92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m
   .............[93m[NO][0m.............  fused_lamb [93m[NO][0m[93m[NO][0m .......  ............. ....... .......[92m[OKAY][0m [93m[NO][0m 
[92m[OKAY][0m [92m[OKAY][0m

.......fused_lamb  [92m[OKAY][0m.............fused_lamb
[92m[OKAY][0m

----------------------------------------------------------------------------------------------------
--------------------------------------------------


  fused_lamb.............[93m[NO][0m   .............[93m[NO][0m.......   [93m[NO][0m.......[92m[OKAY][0m  
sparse_attn.......[92m[OKAY][0m 
 ............[92m[OKAY][0m
--------------------------------------------------op nameop nameop name
 [93m[NO][0m ....... [92m[OKAY][0m
   ................op name................................    installedinstalled................installed    installed......    compatible..
ninjaninjaninjaninja    ...................................................... ..................  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


----------------------------------------------------------------------------------------------------
transformer sparse_attn............  sparse_attn............[93m[NO][0m  sparse_attn[93m[NO][0m  ................... .......  ............ [92m[OKAY][0m[93m[NO][0m  [92m[OKAY][0m.......
[93m[NO][0m
  [92m[OKAY][0m.......
stochastic_transformer transformer [92m[OKAY][0m .transformer
compatible compatible

----------------------------------------------------------------------------------------------------compatible--------------------------------------------------


--------------------------------------------------
--------------------------------------------------
op name--------------------------------------------------
 op name
............   [93m[NO][0mtransformer............ [93m[NO][0m  ............ .......[93m[NO][0m ....... [93m[NO][0m [92m[OKAY][0m.......   
[92m[OKAY][0m.......[92m[OKAY][0m
 
[92m[OKAY][0m
cpu_adam ...............cpu_adam cpu_adam [92m[YES][0m...............  ......cpu_adam   ...............[92m[YES][0m[92m[OKAY][0m............... 
op name ................  op name................................   installedinstalled................  installed..  .. .. installedcompatible compatible 

compatible..--------------------------------------------------
-------------------------------------------------- 
stochastic_transformerstochastic_transformer stochastic_transformer ..   [93m[NO][0m.[93m[NO][0m   .......[93m[NO][0m.......   [92m[OKAY][0m.......[92m[OKAY][0m
 
  [92m[YES][0m......[92m[YES][0m   ............ [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m
fused_adam

compatible--------------------------------------------------

--------------------------------------------------
[92m[OKAY][0m
 ............. [93m[NO][0m ....... fused_adam[92m[OKAY][0m 
.............fused_adamfused_adam  fused_lamb [93m[NO][0m.............  ............. ....................[93m[NO][0m  [93m[NO][0m [93m[NO][0m [92m[OKAY][0m 
cpu_adamcpu_adam  ..............................  cpu_adam[92m[YES][0m[92m[YES][0mcpu_adam    ..................... .....................  [92m[OKAY][0m[92m[YES][0m [92m[OKAY][0m
 ....... ..............fused_lamb[92m[OKAY][0m   
[92m[OKAY][0m[92m[OKAY][0m
.............
[92m[YES][0m 
 [93m[NO][0mfused_lamb fused_lamb ....... ............. ............. [92m[OKAY][0msparse_attn  [93m[NO][0m
 ............  [92m[OKAY][0m[92m[OKAY][0m
fused_adam
............[93m[NO][0m   .......[93m[NO][0m.......   [92m[OKAY][0m.......[92m[OKAY][0m
 
 fused_adam.............  [93m[NO][0m.............  .......fused_adam[93m[NO][0mfused_adam    [92m[OKAY][0m................................. 
[92m[OKAY][0msparse_attn
 [93m[NO][0m [92m[OKAY][0m fused_lamb[93m[NO][0m....... 
 ............ transformer[93m[NO][0m  ...................  [93m[NO][0msparse_attn[92m[OKAY][0m  
sparse_attn...................  transformer [92m[OKAY][0m............ [93m[NO][0m
.......   .............[92m[OKAY][0mfused_lamb [92m[OKAY][0m
[93m[NO][0m 
  ............[93m[NO][0m.......   .......stochastic_transformer[93m[NO][0m[92m[OKAY][0m 
 ....................fused_lamb  fused_lamb [93m[NO][0m[92m[OKAY][0m .............
.......   .............[92m[OKAY][0m[93m[NO][0m 
  .[92m[OKAY][0m.......transformer
   [93m[NO][0m[92m[OKAY][0m............ 
 [93m[NO][0m....... [92m[OKAY][0m 
....... sparse_attn[92m[OKAY][0m 
transformer ....... [93m[NO][0m stochastic_transformer............   .[92m[OKAY][0m....... [93m[NO][0m 
[93m[NO][0m [92m[OKAY][0m .......
............ [93m[NO][0m sparse_attn.......  [92m[OKAY][0m............
.......  [92m[OKAY][0m[92m[OKAY][0m

 sparse_attn[93m[NO][0m  ............transformer.......   [93m[NO][0m............[92m[OKAY][0m 
stochastic_transformer . stochastic_transformer[93m[NO][0m  ........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
[93m[NO][0m sparse_attn ..............transformer   ............ [92m[OKAY][0m [92m[OKAY][0m

............[93m[NO][0m  transformer.......[93m[NO][0m  stochastic_transformer  [92m[OKAY][0m.................... 
  [93m[NO][0m[93m[NO][0m[92m[OKAY][0mtransformer  ....... 
....... ............  [92m[OKAY][0mstochastic_transformer[92m[OKAY][0m[93m[NO][0m
 
 ........stochastic_transformer   [93m[NO][0m[92m[OKAY][0m.  .......[93m[NO][0m
  .......[92m[OKAY][0m 
stochastic_transformer[92m[OKAY][0m
 . [93m[NO][0m ....... [92m[OKAY][0m
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report


DeepSpeed C++/CUDA extension op report------------------------------------------------------------------------------------------------------------------------------------------------------


JIT compiled ops requires ninjaJIT compiled ops requires ninja--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

ninjaninjaninjaninja   ....................................  .................. ..................[92m[OKAY][0m [92m[OKAY][0m [92m[OKAY][0m

[92m[OKAY][0m
----------------------------------------------------------------------------------------------------
--------------------------------------------------


--------------------------------------------------op nameop nameop name
   ................op name................................    installed................installedinstalled   installed..   ......compatible   
compatiblecompatiblecompatible--------------------------------------------------


----------------------------------------------------------------------------------------------------
--------------------------------------------------

cpu_adam ............... cpu_adam[92m[YES][0mcpu_adam  cpu_adam...............  ...... .............................. [92m[YES][0m [92m[OKAY][0m  [92m[YES][0m
[92m[YES][0m......   ............[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m

fused_adam ............. [93m[NO][0m ....... fused_adam[92m[OKAY][0m fused_adamfused_adam.............
   .............[93m[NO][0mfused_lamb.............    [93m[NO][0m.......[93m[NO][0m.............    ..............[92m[OKAY][0m
 [93m[NO][0m [92m[OKAY][0m [92m[OKAY][0m
.......fused_lamb
  [92m[OKAY][0m.............fused_lambfused_lamb
   .............[93m[NO][0m.............   [93m[NO][0m.......[93m[NO][0m   ..............[92m[OKAY][0msparse_attn 
  [92m[OKAY][0m[92m[OKAY][0m............

 [93m[NO][0m ....... [92m[OKAY][0m
sparse_attntransformer sparse_attn ............  ........................[93m[NO][0msparse_attn    [93m[NO][0m[93m[NO][0m.......  ............ .............. [92m[OKAY][0m [93m[NO][0m 
[92m[OKAY][0m[92m[OKAY][0m 

transformer....... transformer ............ stochastic_transformer [92m[OKAY][0m............[93m[NO][0m 
  .[93m[NO][0m.......   .......transformer[92m[OKAY][0m[93m[NO][0m  
 [92m[OKAY][0m...................
 stochastic_transformer [93m[NO][0m [92m[OKAY][0m. stochastic_transformer .......
 [93m[NO][0m  .[92m[OKAY][0m....... 
 [93m[NO][0m[92m[OKAY][0m 
.......stochastic_transformer  [92m[OKAY][0m.
 [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja----------------------------------------------------------------------------------------------------


--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja--------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninja

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m[NO][0m

transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
async_io utils...............  ..................[93m[NO][0m  [92m[YES][0m.......  ......[93m[NO][0m 
[92m[OKAY][0m
quantizer .............. [93m[NO][0mtransformer_inference  .........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m--------------------------------------------------

utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


op nameop nameop nameop name    ................................................ ................  installedinstalled installed  installed .... ..  .. compatible compatiblecompatible
compatible

--------------------------------------------------
--------------------------------------------------
----------------------------------------------------------------------------------------------------


cpu_adamcpu_adamcpu_adam cpu_adam  .............................. ...............   ...............[92m[YES][0m[92m[YES][0m[92m[YES][0m   [92m[YES][0m ............ ......   ......[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m 


[92m[OKAY][0m
fused_adam .............fused_adam fused_adamfused_adam [93m[NO][0m  ............. .......................... .......  [93m[NO][0m[93m[NO][0m [93m[NO][0m  [92m[OKAY][0m .....................
   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
fused_lamb

 ............. fused_lambfused_lamb[93m[NO][0m   fused_lamb.................................    [92m[OKAY][0m.............[93m[NO][0m[93m[NO][0m 
  [93m[NO][0m..............   .......[92m[OKAY][0m[92m[OKAY][0m 
[92m[OKAY][0m

sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attnsparse_attn  ............transformer............ sparse_attn  [93m[NO][0m [93m[NO][0m ........................   .......[93m[NO][0m.......[93m[NO][0m    .......[92m[OKAY][0m.......[92m[OKAY][0m 
 
[92m[OKAY][0m[92m[OKAY][0m

transformertransformer transformer ............ stochastic_transformer............ ............  [93m[NO][0m . [93m[NO][0m[93m[NO][0m .......  [93m[NO][0m.......  ....... [92m[OKAY][0m[92m[OKAY][0m....... 
 
[92m[OKAY][0m[92m[OKAY][0m

stochastic_transformer .stochastic_transformer stochastic_transformer [93m[NO][0m .  ........[93m[NO][0m   [93m[NO][0m.......[92m[OKAY][0m  
[92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
--------------------------------------------------
----------------------------------------------------------------------------------------------------
JIT compiled ops requires ninja

JIT compiled ops requires ninja
JIT compiled ops requires ninja
JIT compiled ops requires ninja

ninjaninjaninjaninja   .................................... ..................  ..................[92m[OKAY][0m[92m[OKAY][0m  

[92m[OKAY][0m[92m[OKAY][0m----------------------------------------------------------------------------------------------------


--------------------------------------------------
op nameop name-------------------------------------------------- op name................  
................ op nameinstalled................   installed.. ................   installed..compatibleinstalled  
 compatible..--------------------------------------------------..
 compatible
 --------------------------------------------------
compatible

--------------------------------------------------
--------------------------------------------------
cpu_adam ............... cpu_adam[92m[YES][0m cpu_adam .....................  cpu_adam ...............[92m[OKAY][0m[92m[YES][0m
   .....................[92m[YES][0m   [92m[YES][0m[92m[OKAY][0m...... 
 ......fused_adam [92m[OKAY][0m [92m[OKAY][0m
.............
 [93m[NO][0mfused_adam  ....................  [92m[OKAY][0m[93m[NO][0m
 fused_adamfused_adam.......   fused_lamb..........................[92m[OKAY][0m   
.............[93m[NO][0m[93m[NO][0m   [93m[NO][0m..............fused_lamb    .......[92m[OKAY][0m.............[92m[OKAY][0m 
 [92m[OKAY][0m
[93m[NO][0m
 ....... fused_lambfused_lamb[92m[OKAY][0m  
..........................  [93m[NO][0m[93m[NO][0m  ..............sparse_attn   [92m[OKAY][0m[92m[OKAY][0m............

 [93m[NO][0msparse_attn  ...................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0mtransformer
 ............ sparse_attnsparse_attntransformer [93m[NO][0m ............  ............  ...................[93m[NO][0m  [93m[NO][0m[92m[OKAY][0m [93m[NO][0m ....... 
 ..............[92m[OKAY][0m stochastic_transformer 
 [92m[OKAY][0m[92m[OKAY][0m.
transformer 
 [93m[NO][0mstochastic_transformer............ transformer ....... .   [93m[NO][0m[92m[OKAY][0m............[93m[NO][0m
   .......[93m[NO][0m.......   [92m[OKAY][0m.......[92m[OKAY][0m
 
[92m[OKAY][0m
stochastic_transformer stochastic_transformer . .[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
--------------------------------------------------
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ....... .......[93m[NO][0m 
[93m[NO][0m
transformer_inference transformer_inference .. [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utils .................. utils[92m[YES][0m  ........................  [92m[OKAY][0m
[92m[YES][0m ...... quantizer[92m[OKAY][0m 
.............. [93m[NO][0m ....... [92m[OKAY][0mquantizer
 .............. [93m[NO][0m --------------------------------------------------.......
 [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
async_ioutilsasync_io   ................................................   [92m[YES][0m[93m[NO][0m[93m[NO][0m   ....................   [92m[OKAY][0m[93m[NO][0m[93m[NO][0m


quantizer .............. [93m[NO][0m ....... [92m[OKAY][0mtransformer_inference
 transformer_inference ....  --------------------------------------------------[93m[NO][0m[93m[NO][0m
  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report--------------------------------------------------


--------------------------------------------------DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

----------------------------------------------------------------------------------------------------

ninjaninjaninjaninja   .................. ....................................  .................. [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------op name

 op nameop name................ op name  ................................installed   ................ installedinstalled..    ..installedcompatible ..
 compatible --------------------------------------------------
..compatible
-------------------------------------------------- 

compatible--------------------------------------------------

--------------------------------------------------
cpu_adam cpu_adam...............  ...............cpu_adam[92m[YES][0m cpu_adam[92m[YES][0m    ..........................................  [92m[OKAY][0m  
[92m[YES][0m[92m[YES][0m[92m[OKAY][0m  
............  [92m[OKAY][0m[92m[OKAY][0m

fused_adam ............. [93m[NO][0mfused_adam  ....................  [92m[OKAY][0m[93m[NO][0mfused_adam
 fused_adam .......  .............fused_lamb .............[92m[OKAY][0m 
[93m[NO][0m ............. [93m[NO][0m .......fused_lamb [93m[NO][0m .......  .............  .......[93m[NO][0m [92m[OKAY][0m.......[92m[OKAY][0m  

[92m[OKAY][0m[92m[OKAY][0m

fused_lamb .............fused_lamb  [93m[NO][0m............. .......  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0msparse_attn
 sparse_attn............  ............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0msparse_attn
 transformer............sparse_attntransformer   ............[93m[NO][0m ............  ............ [93m[NO][0m .......[93m[NO][0m [93m[NO][0m   .....................[92m[OKAY][0m   [92m[OKAY][0m
[92m[OKAY][0m[92m[OKAY][0m


transformer ............transformerstochastic_transformerstochastic_transformer    [93m[NO][0m..............    [93m[NO][0m.......[93m[NO][0m[93m[NO][0m    ..............[92m[OKAY][0m.......  
[92m[OKAY][0m [92m[OKAY][0m

[92m[OKAY][0m
stochastic_transformer . stochastic_transformer[93m[NO][0m  ........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ...............async_io [93m[NO][0m  ......................  [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
transformer_inference .. transformer_inference[93m[NO][0m  .........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
utils .................. [92m[YES][0mutils  ........................  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m quantizer.......  ..............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja

ninjaninjaninja   ......................................................  ninja [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
 
--------------------------------------------------
..................--------------------------------------------------
 op name
 --------------------------------------------------[92m[OKAY][0mop name
................
  --------------------------------------------------................op nameinstalled
   installedop name................ ..   .................. installedcompatiblecompatible  

..installed----------------------------------------------------------------------------------------------------  

..compatible 
compatible--------------------------------------------------

--------------------------------------------------
cpu_adamcpu_adam  ..............................cpu_adamcpu_adam    [92m[YES][0m[92m[YES][0m..............................   ............   [92m[YES][0m[92m[YES][0m [92m[OKAY][0m[92m[OKAY][0m 
......
......  [92m[OKAY][0m[92m[OKAY][0m

fused_adam fused_adam............. fused_adam .............fused_adam[93m[NO][0m    .............[93m[NO][0m....................    [92m[OKAY][0m[93m[NO][0m
.......[93m[NO][0m   fused_lamb....... .......[92m[OKAY][0m 
............. [92m[OKAY][0m [92m[OKAY][0mfused_lamb
[93m[NO][0m
  .................... fused_lamb fused_lamb[92m[OKAY][0m [93m[NO][0m 
............. ............. ....... [93m[NO][0m [93m[NO][0m [92m[OKAY][0m .......
.......  [92m[OKAY][0m[92m[OKAY][0m

sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attnsparse_attntransformer  sparse_attn ....................................   [93m[NO][0m ............ [93m[NO][0m[93m[NO][0m .......   .......[93m[NO][0m.......[92m[OKAY][0m  [92m[OKAY][0m....... 

 [92m[OKAY][0m[92m[OKAY][0m

transformerstochastic_transformer  transformer............transformer.    ........................[93m[NO][0m [93m[NO][0m   [93m[NO][0m[93m[NO][0m..............   [92m[OKAY][0m [92m[OKAY][0m..............

  [92m[OKAY][0m[92m[OKAY][0mstochastic_transformer

 . [93m[NO][0mstochastic_transformer stochastic_transformer .......  ..[92m[OKAY][0m  
[93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

DeepSpeed general environment info:
DeepSpeed general environment info:torch install path 
............... torch install path ...............['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 
torch version .................... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']1.8.1

torch versiontorch cuda version  ...................................  1.8.111.1

torch cuda versionnvcc version  ....................................  11.111.2

nvcc versiondeepspeed install path  ................................  11.2
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed install path
 ...........deepspeed info  ...................['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed info
 deepspeed wheel compiled w....................  ......0.4.2+72ce55a, 72ce55a, big-science 
torch 1.8, cuda 11.1deepspeed wheel compiled w.
 ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 
.................... 1.8.1torch version
 .................... torch cuda version1.8.1 
............... 11.1torch cuda version
 nvcc version...............  ..................... 11.2
11.1deepspeed install path
 ...........nvcc version  .....................['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 11.2

deepspeed infodeepspeed install path ................... 0.4.2+72ce55a, 72ce55a, big-science 
...........deepspeed wheel compiled w.  ......['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
torch 1.8, cuda 11.1deepspeed info
 ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

DeepSpeed general environment info:
torch install pathtorch install path ...............  torch install path...............  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']torch version
 ....................torch version  1.8.1torch version....................
  ....................1.8.1 
torch cuda version1.8.1 
torch cuda version...............torch cuda version   ...............11.1............... 
 11.111.1nvcc version

 nvcc versionnvcc version.....................   ..........................................11.2 
11.2 
deepspeed install path11.2 deepspeed install path
........... deepspeed install path ...........  ...........['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed infodeepspeed info
  deepspeed info......................................   ...................0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science 

0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w.deepspeed wheel compiled w.
  ......deepspeed wheel compiled w.......   torch 1.8, cuda 11.1......torch 1.8, cuda 11.1
 
torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install pathDeepSpeed general environment info: ............... 
torch install path['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 
............... torch version .................... 1.8.1
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch cuda version ...............torch version  11.1....................
 nvcc version1.8.1 
..................... torch cuda version11.2 
...............deepspeed install path  11.1...........
 nvcc version ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'].....................
 deepspeed info11.2 
...................deepspeed install path  0.4.2+72ce55a, 72ce55a, big-science...........
 deepspeed wheel compiled w.['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
...... deepspeed infotorch 1.8, cuda 11.1 
................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
DeepSpeed general environment info:
DeepSpeed general environment info:torch install path
torch version .................... 1.8.1
torch cuda version ............... 11.1
 ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']1.8.1

torch cuda version ...............torch version  11.1....................
 nvcc version1.8.1 .....................
 torch cuda version11.2 
...............deepspeed install path  11.1...........
 nvcc version['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
..................... deepspeed info11.2 
...................deepspeed install path  0.4.2+72ce55a, 72ce55a, big-science...........
 deepspeed wheel compiled w. ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']......
nvcc version ..................... 11.2
 deepspeed infotorch 1.8, cuda 11.1 
................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


------------------------------------------------------------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


------------------------------------------------------------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report


JIT compiled ops requires ninjaJIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ...............async_io  [93m[NO][0m...............  .......[93m[NO][0m  [93m[NO][0m.......
 [93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m
...... [92m[OKAY][0m
quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m

--------------------------------------------------
----------------------------------------------------------------------------------------------------

--------------------------------------------------
op nameop name
  op name................op name  ................ ................installed installed  .. ................ .. installedcompatible 
installed compatible--------------------------------------------------.. 
 ..
compatible 
--------------------------------------------------compatible--------------------------------------------------


--------------------------------------------------
cpu_adam ............... [92m[YES][0mcpu_adam  ......cpu_adam...............cpu_adam    [92m[OKAY][0m...............[92m[YES][0m............... 
  [92m[YES][0m[92m[YES][0m......   ............[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0mfused_adam

 ............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam ............. fused_adamfused_lamb[93m[NO][0mfused_adam    ..............................................   [93m[NO][0m[92m[OKAY][0m [93m[NO][0m 
[93m[NO][0m .......  .......fused_lamb....... [92m[OKAY][0m [92m[OKAY][0m
.............  [92m[OKAY][0m[93m[NO][0m

 ....... fused_lamb[92m[OKAY][0mfused_lamb 
 .......................... [93m[NO][0m sparse_attn[93m[NO][0m   ..........................   sparse_attn[93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m  

...................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0mtransformer
 sparse_attn............sparse_attn transformer [93m[NO][0m  ............ ........................ .......  [93m[NO][0m [93m[NO][0m[93m[NO][0m [92m[OKAY][0m .......
 ....... ....... [92m[OKAY][0m [92m[OKAY][0mstochastic_transformer[92m[OKAY][0m

 
.transformer  [93m[NO][0mstochastic_transformertransformer   ................................   [93m[NO][0m [92m[OKAY][0m[93m[NO][0m[93m[NO][0m  
....... ....... .......  [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


stochastic_transformerstochastic_transformer  . .[93m[NO][0m  [93m[NO][0m.......  ....... [92m[OKAY][0m
[92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report--------------------------------------------------JIT compiled ops requires ninja


--------------------------------------------------
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja


JIT compiled ops requires ninja
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------
op name

 op name................op nameop name   installed  ..................................................    installedinstalledinstalledcompatible   
....  ..compatible-------------------------------------------------- compatible


compatible--------------------------------------------------

----------------------------------------------------------------------------------------------------

cpu_adam ...............cpu_adam  [92m[YES][0m...............cpu_adam cpu_adam  [92m[YES][0m ..................... ...............  ...... [92m[OKAY][0m [92m[YES][0m
[92m[OKAY][0m[92m[YES][0m  
............  [92m[OKAY][0m[92m[OKAY][0m

fused_adamfused_adam  ..........................  [93m[NO][0mfused_adam[93m[NO][0mfused_adam   .................... .......  [92m[OKAY][0m.............
 [93m[NO][0m[92m[OKAY][0m [93m[NO][0m
fused_lamb   ....................fused_lamb ....... [92m[OKAY][0m [93m[NO][0m .............
  [92m[OKAY][0m[93m[NO][0mfused_lamb.......
   ....................[92m[OKAY][0m 
 [92m[OKAY][0mfused_lamb[93m[NO][0m
  ....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0msparse_attn
 ............ [93m[NO][0m ....... sparse_attn[92m[OKAY][0m 
............ sparse_attn[93m[NO][0mtransformer   sparse_attn...............................    [92m[OKAY][0m[93m[NO][0m............
[93m[NO][0m   transformer[93m[NO][0m....... .......   ...................[92m[OKAY][0m [92m[OKAY][0m[93m[NO][0m 

[92m[OKAY][0m .......
stochastic_transformertransformer  [92m[OKAY][0mtransformer . 
............ ............ [93m[NO][0m [93m[NO][0m[93m[NO][0m  stochastic_transformer....... .......  . .......[92m[OKAY][0m  [93m[NO][0m[92m[OKAY][0m
[92m[OKAY][0m 

....... [92m[OKAY][0m
stochastic_transformerstochastic_transformer  ..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch version torch version....................  ....................1.8.1 
1.8.1torch cuda version
 ...............torch cuda version  11.1...............
 nvcc version11.1 
.....................nvcc version  11.2.....................
 11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
------------------------------------------------------------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


--------------------------------------------------DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------


----------------------------------------------------------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja


JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

DeepSpeed C++/CUDA extension op report--------------------------------------------------

------------------------------------------------------------------------------------------------------------------------------------------------------JIT compiled ops requires ninja


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


------------------------------------------------------------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------


op nameop nameop name  ................op name ................  ................ installed................ installed  installed installed.. ..  .. ..compatible compatible 
compatible
compatible
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------

cpu_adam cpu_adam...............cpu_adam  cpu_adam ...............[92m[YES][0m  ............... ...............[92m[YES][0m ...... [92m[YES][0m  [92m[YES][0m [92m[OKAY][0m......
......   ......[92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0m
fused_adam ............. fused_adam[93m[NO][0mfused_adam  fused_adam .................... .............  ............. [93m[NO][0m[92m[OKAY][0m [93m[NO][0m 
[93m[NO][0m ....... ....... ....... [92m[OKAY][0mfused_lamb 
 [92m[OKAY][0m[92m[OKAY][0m.............

 [93m[NO][0m .......fused_lambfused_lambfused_lamb    [92m[OKAY][0m.......................................
   [93m[NO][0m[93m[NO][0m[93m[NO][0m   .....................   [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m

sparse_attn ............ [93m[NO][0m ....... sparse_attn[92m[OKAY][0msparse_attn 
 ............sparse_attn............   ............transformer[93m[NO][0m[93m[NO][0m   [93m[NO][0m ................... .......  ....... [93m[NO][0m[92m[OKAY][0m [92m[OKAY][0m 
[92m[OKAY][0m
.......
 transformer[92m[OKAY][0mtransformer 
 transformer........................   ............[93m[NO][0m[93m[NO][0m  stochastic_transformer [93m[NO][0m....... .......   ........[92m[OKAY][0m[92m[OKAY][0m  

[92m[OKAY][0m[93m[NO][0m
 ....... stochastic_transformer[92m[OKAY][0mstochastic_transformer
 stochastic_transformer ..   .[93m[NO][0m[93m[NO][0m  [93m[NO][0m ....... ..............   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


ninjaninjaninjaninja   .................. ......................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------

op name
op name op name ................ op name................................   installed ................installed  ..installed .. installed  ..compatible 
compatible ..--------------------------------------------------compatible 


compatible--------------------------------------------------
--------------------------------------------------

--------------------------------------------------
cpu_adam ............... [92m[YES][0m cpu_adam...... cpu_adamcpu_adam...............    [92m[OKAY][0m[92m[YES][0m..............................
/bin/sh: line 0: type: git: not found
   ......[92m[YES][0m [92m[YES][0m [92m[OKAY][0m 
............  [92m[OKAY][0mfused_adam[92m[OKAY][0m
 
............. [93m[NO][0m .......fused_adam  [92m[OKAY][0m.............
 [93m[NO][0mfused_adam ....... fused_adam fused_lamb.............[92m[OKAY][0m   
[93m[NO][0m..........................   fused_lamb[93m[NO][0m[93m[NO][0m.......   .................... ....... [92m[OKAY][0m  [92m[OKAY][0m
[93m[NO][0m[92m[OKAY][0m 

.......fused_lamb  [92m[OKAY][0mfused_lamb............. 
 .............[93m[NO][0m  [93m[NO][0m.......  .......sparse_attn[92m[OKAY][0m  
[92m[OKAY][0m............
 [93m[NO][0msparse_attn  ...................  [92m[OKAY][0m[93m[NO][0m
 .......transformer  sparse_attn............[92m[OKAY][0m  
[93m[NO][0m sparse_attn...................transformer    [93m[NO][0m............[92m[OKAY][0m ............ 
 .......[93m[NO][0m[93m[NO][0m   [92m[OKAY][0mstochastic_transformer..............
   .transformer[92m[OKAY][0m[92m[OKAY][0m  ............

[93m[NO][0m  [93m[NO][0m.......stochastic_transformer   transformer[92m[OKAY][0m....... .
  ............[93m[NO][0m[92m[OKAY][0m  
[93m[NO][0m.......  [92m[OKAY][0m.......stochastic_transformer
  [92m[OKAY][0m.
 [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------
DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report

--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------


----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja
JIT compiled ops requires ninja


--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report
--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
DeepSpeed C++/CUDA extension op report

--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
JIT compiled ops requires ninjaJIT compiled ops requires ninja
--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
ninjaninjaninjaninja    ...................................................... ..................  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------

op name
 op nameop name................op name    installed................................................    ..installedinstalledinstalled   compatible ..
....   compatiblecompatible--------------------------------------------------compatible


--------------------------------------------------

----------------------------------------------------------------------------------------------------

cpu_adam cpu_adam...............cpu_adamcpu_adam    .............................................   [92m[YES][0m[92m[YES][0m [92m[YES][0m[92m[YES][0m  ..................   ......[92m[OKAY][0m[92m[OKAY][0m  [92m[OKAY][0m

[92m[OKAY][0m

fused_adam fused_adam.............fused_adam fused_adam.............  [93m[NO][0m  ..........................[93m[NO][0m    .......[93m[NO][0m[93m[NO][0m.......   ....... .......[92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m 


[92m[OKAY][0m
fused_lambfused_lambfused_lamb  fused_lamb .......................... .............  [93m[NO][0m.............  [93m[NO][0m[93m[NO][0m .......[93m[NO][0m  [92m[OKAY][0m
  .....................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attnsparse_attn  transformersparse_attn........................    ............[93m[NO][0m............[93m[NO][0m   [93m[NO][0m[93m[NO][0m  ....... ..............  ....... [92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m

[92m[OKAY][0m

transformertransformertransformer  stochastic_transformer ........................  ............ .[93m[NO][0m [93m[NO][0m  [93m[NO][0m[93m[NO][0m .......  ....... ..............[92m[OKAY][0m  
 [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


stochastic_transformerstochastic_transformer stochastic_transformer  ..  .[93m[NO][0m [93m[NO][0m [93m[NO][0m  .....................  [92m[OKAY][0m[92m[OKAY][0m
 
[92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

ninjaninjaninjaninja  ....................................    [92m[OKAY][0m....................................[92m[OKAY][0m
  
[92m[OKAY][0m--------------------------------------------------[92m[OKAY][0m
--------------------------------------------------


----------------------------------------------------------------------------------------------------
op nameop name
 op name ................  op name................................installed    ................installed..installed   .. installedcompatible  ..
.. compatible--------------------------------------------------compatible
 

compatible----------------------------------------------------------------------------------------------------


--------------------------------------------------
cpu_adam ...............cpu_adamcpu_adam cpu_adam  [92m[YES][0m .............................. ...............  ......[92m[YES][0m [92m[YES][0m  [92m[YES][0m [92m[OKAY][0m...... ............
   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


fused_adam ............. fused_adam[93m[NO][0mfused_adam fused_adam  ............. .................... .............   [93m[NO][0m[92m[OKAY][0m[93m[NO][0m[93m[NO][0m 
  .....................  fused_lamb [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m 
.............

 [93m[NO][0mfused_lamb fused_lamb .......  fused_lamb.............[92m[OKAY][0m.............  
 .............[93m[NO][0m[93m[NO][0m   [93m[NO][0m..............   .......[92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0m
sparse_attn ............ [93m[NO][0m ....... [92m[OKAY][0msparse_attn
 sparse_attnsparse_attntransformer ............  ........................   ............[93m[NO][0m[93m[NO][0m [93m[NO][0m[93m[NO][0m   ....... ..............  ....... [92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m


[92m[OKAY][0m
transformertransformertransformer   ....................................stochastic_transformer    [93m[NO][0m [93m[NO][0m.[93m[NO][0m.......   .......  [92m[OKAY][0m.......[93m[NO][0m
 [92m[OKAY][0m[92m[OKAY][0m
 
stochastic_transformer.......  stochastic_transformer[92m[OKAY][0m stochastic_transformer.
.   .[93m[NO][0m [93m[NO][0m [93m[NO][0m ....... ....... ....... [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

----------------------------------------------------------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report


--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------

DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja    ......................................................   ..................[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
 
[92m[OKAY][0m
----------------------------------------------------------------------------------------------------


--------------------------------------------------
--------------------------------------------------op nameop name
  op name................op name................  installed   ................................installed ..  installed ..installed compatible  ..
compatible.. --------------------------------------------------
 compatible
compatible--------------------------------------------------


----------------------------------------------------------------------------------------------------

cpu_adam ............... cpu_adamcpu_adam[92m[YES][0m cpu_adam ...............  ............... ..................... [92m[YES][0m  [92m[YES][0m [92m[YES][0m[92m[OKAY][0m ...... 
...... ......[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m

fused_adam ............. [93m[NO][0m fused_adamfused_adam....... fused_adam  .......................... [92m[OKAY][0m 
 .............[93m[NO][0m[93m[NO][0m   .......fused_lamb[93m[NO][0m   ...........................[92m[OKAY][0m  [93m[NO][0m 
[92m[OKAY][0m [92m[OKAY][0m
.......
fused_lamb  [92m[OKAY][0mfused_lamb
............. fused_lamb.............   [93m[NO][0m.............[93m[NO][0m   ..............[93m[NO][0m   [92m[OKAY][0m[92m[OKAY][0m.......

 sparse_attn [92m[OKAY][0m............
 [93m[NO][0m ....... [92m[OKAY][0m
sparse_attnsparse_attn transformer ............ ............ ............ sparse_attn[93m[NO][0m[93m[NO][0m    [93m[NO][0m..........................    .......[92m[OKAY][0m[92m[OKAY][0m
[93m[NO][0m 
[92m[OKAY][0m transformer
 .......transformer............   [93m[NO][0mstochastic_transformer[92m[OKAY][0m............
   ........[93m[NO][0mtransformer   [92m[OKAY][0m [93m[NO][0m
...................   .......[92m[OKAY][0m[93m[NO][0m stochastic_transformer
 [92m[OKAY][0m .
.......stochastic_transformer  [93m[NO][0m [92m[OKAY][0m 
........  stochastic_transformer[93m[NO][0m[92m[OKAY][0m  
........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0masync_io  ...................... [93m[NO][0m  .......[93m[NO][0m 
[93m[NO][0m
transformer_inferencetransformer_inference ..  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utils .................. utils[92m[YES][0m  ...... [92m[OKAY][0m..................
 [92m[YES][0m ...... quantizer[92m[OKAY][0m .............. 
[93m[NO][0m ....... [92m[OKAY][0m
quantizer ..............-------------------------------------------------- 
[93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m ....... async_io[93m[NO][0m
 ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. utils[93m[NO][0m  .........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
utilsquantizer  ................................  [92m[YES][0m[93m[NO][0m  .............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------quantizer
 .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ...............async_io  [93m[NO][0m...............  ....... [93m[NO][0m[93m[NO][0m 
....... [93m[NO][0m
transformer_inference .. [93m[NO][0m transformer_inference.......  ..[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
utils ..................utils  [92m[YES][0m..................  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io ...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utilsutils  ....................................  [92m[YES][0m [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m--------------------------------------------------

--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inferencetransformer_inference  ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
----------------------------------------------------------------------------------------------------

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_iotransformer_inference  .................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[93m[NO][0m

utils .................. [92m[YES][0m ......transformer_inference  [92m[OKAY][0m..
 [93m[NO][0m ....... quantizer[92m[OKAY][0m 
.............. [93m[NO][0m ....... [92m[OKAY][0mutils
 .................. [92m[YES][0m ...... --------------------------------------------------[92m[OKAY][0m

quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 
.................... 1.8.1torch version
 ....................torch cuda version  1.8.1...............
 11.1
torch cuda versionnvcc version  ....................................  11.111.2

nvcc versiondeepspeed install path  ........... .....................['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
11.2deepspeed info
 deepspeed install path...................  ...........0.4.2+72ce55a, 72ce55a, big-science 
deepspeed wheel compiled w. ...... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']torch 1.8, cuda 11.1

deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch version .................... torch install path1.8.1 
...............torch cuda version  ............... 11.1
nvcc version ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'].....................
 11.2
torch versiondeepspeed install path  ...............................  1.8.1
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed infotorch cuda version  ..................................  0.4.2+72ce55a, 72ce55a, big-science11.1

nvcc versiondeepspeed wheel compiled w.  ...........................  11.2torch 1.8, cuda 11.1

deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:DeepSpeed general environment info:


DeepSpeed general environment info:torch install pathtorch install pathtorch install path  ...............
 ............... ...............  torch install path ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']...............['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 


torch versiontorch versiontorch version   ............................................................['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']   
1.8.11.8.11.8.1


torch cuda versiontorch versiontorch cuda version  torch cuda version ................................... ...............   ...............1.8.111.111.1 


11.1nvcc version
nvcc versiontorch cuda version  .....................nvcc version ..................... ...............  11.2 .....................11.2
11.1 
deepspeed install path
deepspeed install path11.2  
nvcc version......................deepspeed install path    ................................ ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 11.2


deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info deepspeed install path
 ................... deepspeed info................... ...........  0.4.2+72ce55a, 72ce55a, big-science ...................0.4.2+72ce55a, 72ce55a, big-science
 
deepspeed wheel compiled w.['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w. 

 ......deepspeed wheel compiled w.......deepspeed info    torch 1.8, cuda 11.1torch 1.8, cuda 11.1.........................

  torch 1.8, cuda 11.10.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed infoDeepSpeed general environment info: ................... 0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w. ...... torch install pathtorch 1.8, cuda 11.1
 ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch versiontorch version  ........................................  1.8.11.8.1

torch cuda version torch cuda version...............  ...............11.1 
DeepSpeed general environment info:
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ...................0.4.2+72ce55a, 72ce55a, big-science 
0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
async_io ............... [93m[NO][0m .......async_io  [93m[NO][0m...............
 [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m transformer_inference.......  ..[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
utils .................. utils[92m[YES][0m  ........................  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0mquantizer
 .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
using world size: 64, data-parallel-size: 4, tensor-model-parallel size: 4, pipeline-model-parallel size: 4 
using torch.float16 for parameters ...
------------------------ arguments ------------------------
  accumulate_allreduce_grads_in_fp32 .............. False
  adam_beta1 ...................................... 0.9
  adam_beta2 ...................................... 0.999
  adam_eps ........................................ 1e-08
  adlr_autoresume ................................. False
  adlr_autoresume_interval ........................ 1000
  apply_query_key_layer_scaling ................... True
  apply_residual_connection_post_layernorm ........ False
  attention_dropout ............................... 0.1
  attention_softmax_in_fp32 ....................... False
  bert_binary_head ................................ True
  bert_load ....................................... None
  bf16 ............................................ False
  bias_dropout_fusion ............................. True
  bias_gelu_fusion ................................ True
  biencoder_projection_dim ........................ 0
  biencoder_shared_query_context_model ............ False
  block_data_path ................................. None
  checkpoint_activations .......................... True
  checkpoint_in_cpu ............................... False
  checkpoint_num_layers ........................... 1
  clip_grad ....................................... 1.0
  codecarbon_dir .................................. None
  consumed_train_samples .......................... 0
  consumed_valid_samples .......................... 0
  contigious_checkpointing ........................ False
  cpu_optimizer ................................... False
  cpu_torch_adam .................................. False
  data_impl ....................................... mmap
  data_parallel_size .............................. 4
  data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document']
  dataloader_type ................................. single
  DDP_impl ........................................ local
  decoder_seq_length .............................. None
  deepscale ....................................... False
  deepscale_config ................................ None
  deepspeed ....................................... True
  deepspeed_activation_checkpointing .............. True
  deepspeed_config ................................ ./ds_config.1269461.json
  deepspeed_mpi ................................... False
  distribute_checkpointed_activations ............. False
  distributed_backend ............................. nccl
  embedding_path .................................. None
  encoder_seq_length .............................. 2048
  eod_mask_loss ................................... False
  eval_interval ................................... 1000
  eval_iters ...................................... 100
  evidence_data_path .............................. None
  exit_duration_in_mins ........................... 1190
  exit_interval ................................... None
  ffn_hidden_size ................................. 8192
  finetune ........................................ False
  fp16 ............................................ True
  fp16_lm_cross_entropy ........................... False
  fp32_residual_connection ........................ False
  gigaflos_no_embeds .............................. 0
  global_batch_size ............................... 512
  glu_activation .................................. None
  hidden_dropout .................................. 0.1
  hidden_size ..................................... 2048
  hysteresis ...................................... 2
  ict_head_size ................................... None
  ict_load ........................................ None
  img_dim ......................................... 224
  indexer_batch_size .............................. 128
  indexer_log_interval ............................ 1000
  init_method_std ................................. 0.02
  init_method_xavier_uniform ...................... False
  initial_loss_scale .............................. 4294967296
  kv_channels ..................................... 128
  layernorm_epsilon ............................... 1e-05
  lazy_mpu_init ................................... None
  load ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
  local_rank ...................................... 0
  log_batch_size_to_tensorboard ................... True
  log_interval .................................... 200
  log_learning_rate_to_tensorboard ................ True
  log_loss_scale_to_tensorboard ................... True
  log_num_zeros_in_grad ........................... False
  log_params_norm ................................. False
  log_timers_to_tensorboard ....................... True
  log_validation_ppl_to_tensorboard ............... True
  loss_on_targets_only ............................ False
  loss_scale ...................................... 12.0
  loss_scale_window ............................... 1000
  lr .............................................. 0.0002
  lr_decay_iters .................................. None
  lr_decay_samples ................................ 73242187
  lr_decay_style .................................. cosine
  lr_warmup_fraction .............................. None
  lr_warmup_iters ................................. 0
  lr_warmup_samples ............................... 183105
  make_vocab_size_divisible_by .................... 128
  mask_prob ....................................... 0.15
  masked_softmax_fusion ........................... True
  max_position_embeddings ......................... None
  memory_centric_tiled_linear ..................... False
  merge_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-merges.txt
  micro_batch_size ................................ 8
  min_loss_scale .................................. 1.0
  min_lr .......................................... 1e-05
  mmap_warmup ..................................... False
  no_load_optim ................................... None
  no_load_rng ..................................... None
  no_save_optim ................................... None
  no_save_rng ..................................... None
  no_train ........................................ None
  num_attention_heads ............................. 16
  num_channels .................................... 3
  num_classes ..................................... 1000
  num_layers ...................................... 24
  num_layers_per_virtual_pipeline_stage ........... None
  num_workers ..................................... 2
  onnx_safe ....................................... None
  openai_gelu ..................................... False
  optimizer ....................................... adam
  override_lr_scheduler ........................... False
  params_dtype .................................... torch.float16
  partition_activations ........................... False
  patch_dim ....................................... 16
  pipeline_model_parallel_size .................... 4
  position_embedding_type ......................... PositionEmbeddingType.rotary
  profile_backward ................................ False
  query_in_block_prob ............................. 0.1
  rampup_batch_size ............................... ['32', '32', '2_000_000']
  rank ............................................ 0
  remote_device ................................... none
  reset_attention_mask ............................ False
  reset_position_ids .............................. False
  retriever_report_topk_accuracies ................ []
  retriever_score_scaling ......................... False
  retriever_seq_length ............................ 256
  sample_rate ..................................... 1.0
  save ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
  save_interval ................................... 1500
  scatter_gather_tensors_in_pipeline .............. True
  scattered_embeddings ............................ False
  seed ............................................ 1234
  seq_length ...................................... 2048
  sgd_momentum .................................... 0.9
  short_seq_prob .................................. 0.1
  split ........................................... 949,50,1
  split_transformers .............................. False
  synchronize_each_layer .......................... False
  tensor_model_parallel_size ...................... 4
  tensorboard_dir ................................. /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/tr4c-1B3-rotary-oscar-logs/tensorboard
  tensorboard_log_interval ........................ 1
  tensorboard_queue_size .......................... 5
  tile_factor ..................................... 1
  titles_data_path ................................ None
  tokenizer_name_or_path .......................... None
  tokenizer_type .................................. GPT2BPETokenizer
  train_iters ..................................... None
  train_samples ................................... 73242187
  use_checkpoint_lr_scheduler ..................... False
  use_contiguous_buffers_in_ddp ................... False
  use_cpu_initialization .......................... None
  use_one_sent_docs ............................... False
  use_pin_memory .................................. False
  virtual_pipeline_model_parallel_size ............ None
  vocab_extra_ids ................................. 0
  vocab_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-vocab.json
  weight_decay .................................... 0.1
  world_size ...................................... 64
  zero_allgather_bucket_size ...................... 0.0
  zero_contigious_gradients ....................... False
  zero_reduce_bucket_size ......................... 0.0
  zero_reduce_scatter ............................. False
  zero_stage ...................................... 1
-------------------- end of arguments ---------------------
will use batch size rampup starting from global batch size 32 to global batch size 512 with batch size increments 32 over 2000000 samples.
> building GPT2BPETokenizer tokenizer ...
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

 > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688)
> initializing torch distributed ...
> setting tensorboard ...
> initializing tensor model parallel with size 4
> initializing pipeline model parallel with size 4
> setting random seeds to 1234 ...
[2021-10-05 14:10:21,298] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
> compiling dataset index builder ...
make: Entering directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data'
make: Nothing to be done for 'default'.
make: Leaving directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data'
>>> done with dataset index builder. Compilation time: 0.282 seconds
> compiling and loading fused kernels ...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_upper_triang_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_upper_triang_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module fused_mix_prec_layer_norm_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_mix_prec_layer_norm_cuda...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
>>> done with compiling and loading fused kernels. Compilation time: 11.712 seconds
time to initialize megatron (seconds): 57.295
[after megatron is initialized] datetime: 2021-10-05 14:10:33 
building GPT model ...
[2021-10-05 14:10:33,408] [INFO] [utils.py:680:see_memory_usage] Before Building Model
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved
  warnings.warn(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved
  warnings.warn(
[2021-10-05 14:10:33,411] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB         Max_MA 0.0 GB         CA 0.0 GB         Max_CA 0 GB 
[2021-10-05 14:10:33,411] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory:  used = 36.17 GB, percent = 19.3%
SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None
Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=1, data=0, model=0): 16, ProcessCoord(pipe=1, data=0, model=1): 17, ProcessCoord(pipe=1, data=0, model=2): 18, ProcessCoord(pipe=1, data=0, model=3): 19, ProcessCoord(pipe=1, data=1, model=0): 20, ProcessCoord(pipe=1, data=1, model=1): 21, ProcessCoord(pipe=1, data=1, model=2): 22, ProcessCoord(pipe=1, data=1, model=3): 23, ProcessCoord(pipe=1, data=2, model=0): 24, ProcessCoord(pipe=1, data=2, model=1): 25, ProcessCoord(pipe=1, data=2, model=2): 26, ProcessCoord(pipe=1, data=2, model=3): 27, ProcessCoord(pipe=1, data=3, model=0): 28, ProcessCoord(pipe=1, data=3, model=1): 29, ProcessCoord(pipe=1, data=3, model=2): 30, ProcessCoord(pipe=1, data=3, model=3): 31, ProcessCoord(pipe=2, data=0, model=0): 32, ProcessCoord(pipe=2, data=0, model=1): 33, ProcessCoord(pipe=2, data=0, model=2): 34, ProcessCoord(pipe=2, data=0, model=3): 35, ProcessCoord(pipe=2, data=1, model=0): 36, ProcessCoord(pipe=2, data=1, model=1): 37, ProcessCoord(pipe=2, data=1, model=2): 38, ProcessCoord(pipe=2, data=1, model=3): 39, ProcessCoord(pipe=2, data=2, model=0): 40, ProcessCoord(pipe=2, data=2, model=1): 41, ProcessCoord(pipe=2, data=2, model=2): 42, ProcessCoord(pipe=2, data=2, model=3): 43, ProcessCoord(pipe=2, data=3, model=0): 44, ProcessCoord(pipe=2, data=3, model=1): 45, ProcessCoord(pipe=2, data=3, model=2): 46, ProcessCoord(pipe=2, data=3, model=3): 47, ProcessCoord(pipe=3, data=0, model=0): 48, ProcessCoord(pipe=3, data=0, model=1): 49, ProcessCoord(pipe=3, data=0, model=2): 50, ProcessCoord(pipe=3, data=0, model=3): 51, ProcessCoord(pipe=3, data=1, model=0): 52, ProcessCoord(pipe=3, data=1, model=1): 53, ProcessCoord(pipe=3, data=1, model=2): 54, ProcessCoord(pipe=3, data=1, model=3): 55, ProcessCoord(pipe=3, data=2, model=0): 56, ProcessCoord(pipe=3, data=2, model=1): 57, ProcessCoord(pipe=3, data=2, model=2): 58, ProcessCoord(pipe=3, data=2, model=3): 59, ProcessCoord(pipe=3, data=3, model=0): 60, ProcessCoord(pipe=3, data=3, model=1): 61, ProcessCoord(pipe=3, data=3, model=2): 62, ProcessCoord(pipe=3, data=3, model=3): 63}
[2021-10-05 14:10:33,943] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer
stage=0 layers=9
     0: _to_float16
     1: EmbeddingPipe
     2: <lambda>
     3: ParallelTransformerLayerPipe
     4: ParallelTransformerLayerPipe
     5: ParallelTransformerLayerPipe
     6: ParallelTransformerLayerPipe
     7: ParallelTransformerLayerPipe
     8: ParallelTransformerLayerPipe
stage=1 layers=6
     9: ParallelTransformerLayerPipe
    10: ParallelTransformerLayerPipe
    11: ParallelTransformerLayerPipe
    12: ParallelTransformerLayerPipe
    13: ParallelTransformerLayerPipe
    14: ParallelTransformerLayerPipe
stage=2 layers=6
    15: ParallelTransformerLayerPipe
    16: ParallelTransformerLayerPipe
    17: ParallelTransformerLayerPipe
    18: ParallelTransformerLayerPipe
    19: ParallelTransformerLayerPipe
    20: ParallelTransformerLayerPipe
stage=3 layers=10
    21: ParallelTransformerLayerPipe
    22: ParallelTransformerLayerPipe
    23: ParallelTransformerLayerPipe
    24: ParallelTransformerLayerPipe
    25: ParallelTransformerLayerPipe
    26: ParallelTransformerLayerPipe
    27: <lambda>
    28: MixedFusedLayerNorm
    29: EmbeddingPipe
    30: float16_to_fp32
  loss: CrossEntropy
 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 75592704

 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 101549056
 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 101549056
 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 101544960
 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 101544960
 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 101549056
 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 101549056
 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 101544960
[2021-10-05 14:10:34,279] [INFO] [utils.py:680:see_memory_usage] After Building Model
[2021-10-05 14:10:34,280] [INFO] [utils.py:681:see_memory_usage] MA 0.2 GB         Max_MA 0.22 GB         CA 0.24 GB         Max_CA 0 GB 
[2021-10-05 14:10:34,280] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory:  used = 36.56 GB, percent = 19.5%
 > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 101544960
setting training iterations to 152972
> learning rate decay style: cosine
DeepSpeed is enabled.
[2021-10-05 14:10:34,301] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+72ce55a, git-hash=72ce55a, git-branch=big-science
[2021-10-05 14:10:34,366] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False
[2021-10-05 14:10:34,366] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer
[2021-10-05 14:10:34,366] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer
[2021-10-05 14:10:34,366] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam
[2021-10-05 14:10:34,366] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type=<class 'apex.optimizers.fused_adam.FusedAdam'>
[2021-10-05 14:10:34,366] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer
[2021-10-05 14:10:34,366] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000
[2021-10-05 14:10:34,366] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000
[2021-10-05 14:10:34,366] [INFO] [stage2.py:108:__init__] CPU Offload: False
[2021-10-05 14:10:34,366] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False
[2021-10-05 14:10:34,616] [INFO] [stage2.py:419:__init__] optimizer state initialized
[2021-10-05 14:10:34,616] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam
[2021-10-05 14:10:34,616] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler
[2021-10-05 14:10:34,616] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = <megatron.learning_rates.AnnealingLR object at 0x148856551e50>
[2021-10-05 14:10:34,616] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)]
[2021-10-05 14:10:34,616] [INFO] [config.py:900:print] DeepSpeedEngine configuration:
[2021-10-05 14:10:34,617] [INFO] [config.py:904:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2021-10-05 14:10:34,617] [INFO] [config.py:904:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2021-10-05 14:10:34,617] [INFO] [config.py:904:print]   allreduce_always_fp32 ........ False
[2021-10-05 14:10:34,617] [INFO] [config.py:904:print]   amp_enabled .................. False
[2021-10-05 14:10:34,617] [INFO] [config.py:904:print]   amp_params ................... False
[2021-10-05 14:10:34,617] [INFO] [config.py:904:print]   checkpoint_tag_validation_enabled  True
[2021-10-05 14:10:34,617] [INFO] [config.py:904:print]   checkpoint_tag_validation_fail  False
[2021-10-05 14:10:34,617] [INFO] [config.py:904:print]   disable_allgather ............ False
[2021-10-05 14:10:34,617] [INFO] [config.py:904:print]   dump_state ................... False
[2021-10-05 14:10:34,617] [INFO] [config.py:904:print]   dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1}
[2021-10-05 14:10:34,617] [INFO] [config.py:904:print]   eigenvalue_enabled ........... False
[2021-10-05 14:10:34,617] [INFO] [config.py:904:print]   eigenvalue_gas_boundary_resolution  1
[2021-10-05 14:10:34,617] [INFO] [config.py:904:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2021-10-05 14:10:34,617] [INFO] [config.py:904:print]   eigenvalue_layer_num ......... 0
[2021-10-05 14:10:34,617] [INFO] [config.py:904:print]   eigenvalue_max_iter .......... 100
[2021-10-05 14:10:34,617] [INFO] [config.py:904:print]   eigenvalue_stability ......... 1e-06
[2021-10-05 14:10:34,617] [INFO] [config.py:904:print]   eigenvalue_tol ............... 0.01
[2021-10-05 14:10:34,617] [INFO] [config.py:904:print]   eigenvalue_verbose ........... False
[2021-10-05 14:10:34,617] [INFO] [config.py:904:print]   elasticity_enabled ........... False
[2021-10-05 14:10:34,617] [INFO] [config.py:904:print]   flops_profiler_config ........ {
    "enabled": false, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2021-10-05 14:10:34,617] [INFO] [config.py:904:print]   fp16_enabled ................. True
[2021-10-05 14:10:34,617] [INFO] [config.py:904:print]   fp16_mixed_quantize .......... False
[2021-10-05 14:10:34,617] [INFO] [config.py:904:print]   global_rank .................. 0
[2021-10-05 14:10:34,617] [INFO] [config.py:904:print]   gradient_accumulation_steps .. 16
[2021-10-05 14:10:34,617] [INFO] [config.py:904:print]   gradient_clipping ............ 1.0
[2021-10-05 14:10:34,617] [INFO] [config.py:904:print]   gradient_predivide_factor .... 1.0
[2021-10-05 14:10:34,618] [INFO] [config.py:904:print]   initial_dynamic_scale ........ 4096
[2021-10-05 14:10:34,618] [INFO] [config.py:904:print]   loss_scale ................... 0
[2021-10-05 14:10:34,618] [INFO] [config.py:904:print]   memory_breakdown ............. False
[2021-10-05 14:10:34,618] [INFO] [config.py:904:print]   optimizer_legacy_fusion ...... False
[2021-10-05 14:10:34,618] [INFO] [config.py:904:print]   optimizer_name ............... None
[2021-10-05 14:10:34,618] [INFO] [config.py:904:print]   optimizer_params ............. None
[2021-10-05 14:10:34,618] [INFO] [config.py:904:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2021-10-05 14:10:34,618] [INFO] [config.py:904:print]   pld_enabled .................. False
[2021-10-05 14:10:34,618] [INFO] [config.py:904:print]   pld_params ................... False
[2021-10-05 14:10:34,618] [INFO] [config.py:904:print]   prescale_gradients ........... False
[2021-10-05 14:10:34,618] [INFO] [config.py:904:print]   quantize_change_rate ......... 0.001
[2021-10-05 14:10:34,618] [INFO] [config.py:904:print]   quantize_groups .............. 1
[2021-10-05 14:10:34,618] [INFO] [config.py:904:print]   quantize_offset .............. 1000
[2021-10-05 14:10:34,618] [INFO] [config.py:904:print]   quantize_period .............. 1000
[2021-10-05 14:10:34,618] [INFO] [config.py:904:print]   quantize_rounding ............ 0
[2021-10-05 14:10:34,618] [INFO] [config.py:904:print]   quantize_start_bits .......... 16
[2021-10-05 14:10:34,618] [INFO] [config.py:904:print]   quantize_target_bits ......... 8
[2021-10-05 14:10:34,618] [INFO] [config.py:904:print]   quantize_training_enabled .... False
[2021-10-05 14:10:34,618] [INFO] [config.py:904:print]   quantize_type ................ 0
[2021-10-05 14:10:34,618] [INFO] [config.py:904:print]   quantize_verbose ............. False
[2021-10-05 14:10:34,618] [INFO] [config.py:904:print]   scheduler_name ............... None
[2021-10-05 14:10:34,618] [INFO] [config.py:904:print]   scheduler_params ............. None
[2021-10-05 14:10:34,618] [INFO] [config.py:904:print]   sparse_attention ............. None
[2021-10-05 14:10:34,618] [INFO] [config.py:904:print]   sparse_gradients_enabled ..... False
[2021-10-05 14:10:34,618] [INFO] [config.py:904:print]   steps_per_print .............. 2000
[2021-10-05 14:10:34,618] [INFO] [config.py:904:print]   tensorboard_enabled .......... False
[2021-10-05 14:10:34,618] [INFO] [config.py:904:print]   tensorboard_job_name ......... DeepSpeedJobName
[2021-10-05 14:10:34,618] [INFO] [config.py:904:print]   tensorboard_output_path ...... 
[2021-10-05 14:10:34,618] [INFO] [config.py:904:print]   train_batch_size ............. 512
[2021-10-05 14:10:34,618] [INFO] [config.py:904:print]   train_micro_batch_size_per_gpu  8
[2021-10-05 14:10:34,618] [INFO] [config.py:904:print]   use_quantizer_kernel ......... False
[2021-10-05 14:10:34,618] [INFO] [config.py:904:print]   wall_clock_breakdown ......... False
[2021-10-05 14:10:34,618] [INFO] [config.py:904:print]   world_size ................... 4
[2021-10-05 14:10:34,618] [INFO] [config.py:904:print]   zero_allow_untested_optimizer  False
[2021-10-05 14:10:34,618] [INFO] [config.py:904:print]   zero_config .................. {
    "stage": 1, 
    "contiguous_gradients": false, 
    "reduce_scatter": true, 
    "reduce_bucket_size": 5.000000e+08, 
    "allgather_partitions": true, 
    "allgather_bucket_size": 5.000000e+08, 
    "overlap_comm": false, 
    "load_from_fp32_weights": true, 
    "elastic_checkpoint": true, 
    "offload_param": null, 
    "offload_optimizer": null, 
    "sub_group_size": 1.000000e+09, 
    "prefetch_bucket_size": 5.000000e+07, 
    "param_persistence_threshold": 1.000000e+05, 
    "max_live_parameters": 1.000000e+09, 
    "max_reuse_distance": 1.000000e+09, 
    "gather_fp16_weights_on_model_save": false, 
    "ignore_unused_parameters": true, 
    "round_robin_gradients": false, 
    "legacy_stage1": false
}
[2021-10-05 14:10:34,619] [INFO] [config.py:904:print]   zero_enabled ................. True
[2021-10-05 14:10:34,619] [INFO] [config.py:904:print]   zero_optimization_stage ...... 1
[2021-10-05 14:10:34,619] [INFO] [config.py:906:print]   json = {
    "train_micro_batch_size_per_gpu": 8, 
    "train_batch_size": 512, 
    "gradient_clipping": 1.0, 
    "zero_optimization": {
        "stage": 1
    }, 
    "fp16": {
        "enabled": true, 
        "loss_scale": 0, 
        "loss_scale_window": 500, 
        "hysteresis": 2, 
        "min_loss_scale": 1, 
        "initial_scale_power": 12
    }, 
    "steps_per_print": 2.000000e+03, 
    "wall_clock_breakdown": false
}
[2021-10-05 14:10:34,619] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=16 micro_batch_size=8
[2021-10-05 14:10:34,934] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-05 14:10:34,934] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-05 14:10:34,934] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-05 14:10:34,934] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-05 14:10:34,934] [INFO] [engine.py:134:__init__] RANK=34 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-05 14:10:34,934] [INFO] [engine.py:134:__init__] RANK=33 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-05 14:10:34,934] [INFO] [engine.py:134:__init__] RANK=32 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-05 14:10:34,934] [INFO] [engine.py:134:__init__] RANK=35 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-05 14:10:34,934] [INFO] [engine.py:134:__init__] RANK=18 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-05 14:10:34,934] [INFO] [engine.py:134:__init__] RANK=16 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-05 14:10:34,934] [INFO] [engine.py:134:__init__] RANK=19 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-05 14:10:34,934] [INFO] [engine.py:134:__init__] RANK=17 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-05 14:10:34,934] [INFO] [engine.py:134:__init__] RANK=48 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-05 14:10:34,934] [INFO] [engine.py:134:__init__] RANK=50 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-05 14:10:34,934] [INFO] [engine.py:134:__init__] RANK=51 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-05 14:10:34,934] [INFO] [engine.py:134:__init__] RANK=49 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
 > using checkpoint value 0.0002 for learning rate
 > using checkpoint value 1e-05 for minimum learning rate
 > using checkpoint value 183105 for warmup iterations
 > using checkpoint value 73242187 for total number of iterations
 > using checkpoint value cosine for decay style
successfully loaded 4 ZeRO state_dicts for rank 16
successfully loaded 4 ZeRO state_dicts for rank 22
successfully loaded 4 ZeRO state_dicts for rank 20
successfully loaded 4 ZeRO state_dicts for rank 18
successfully loaded 4 ZeRO state_dicts for rank 45
successfully loaded 4 ZeRO state_dicts for rank 23
successfully loaded 4 ZeRO state_dicts for rank 41
successfully loaded 4 ZeRO state_dicts for rank 43
successfully loaded 4 ZeRO state_dicts for rank 37
successfully loaded 4 ZeRO state_dicts for rank 19
successfully loaded 4 ZeRO state_dicts for rank 39
successfully loaded 4 ZeRO state_dicts for rank 42
successfully loaded 4 ZeRO state_dicts for rank 27
successfully loaded 4 ZeRO state_dicts for rank 46
successfully loaded 4 ZeRO state_dicts for rank 34
successfully loaded 4 ZeRO state_dicts for rank 33
successfully loaded 4 ZeRO state_dicts for rank 35
successfully loaded 4 ZeRO state_dicts for rank 32
successfully loaded 4 ZeRO state_dicts for rank 44
successfully loaded 4 ZeRO state_dicts for rank 21
successfully loaded 4 ZeRO state_dicts for rank 40
successfully loaded 4 ZeRO state_dicts for rank 47
successfully loaded 4 ZeRO state_dicts for rank 38
successfully loaded 4 ZeRO state_dicts for rank 36
successfully loaded 4 ZeRO state_dicts for rank 31
successfully loaded 4 ZeRO state_dicts for rank 29
successfully loaded 4 ZeRO state_dicts for rank 24
successfully loaded 4 ZeRO state_dicts for rank 28
successfully loaded 4 ZeRO state_dicts for rank 25
successfully loaded 4 ZeRO state_dicts for rank 17
successfully loaded 4 ZeRO state_dicts for rank 6
successfully loaded 4 ZeRO state_dicts for rank 52
successfully loaded 4 ZeRO state_dicts for rank 48
successfully loaded 4 ZeRO state_dicts for rank 14
successfully loaded 4 ZeRO state_dicts for rank 10
loading 4 zero partition checkpoints for rank 20
successfully loaded 4 ZeRO state_dicts for rank 0
successfully loaded 4 ZeRO state_dicts for rank 2
loading 4 zero partition checkpoints for rank 18
successfully loaded 4 ZeRO state_dicts for rank 56
loading 4 zero partition checkpoints for rank 16
successfully loaded 4 ZeRO state_dicts for rank 1
successfully loaded 4 ZeRO state_dicts for rank 4
loading 4 zero partition checkpoints for rank 22
loading 4 zero partition checkpoints for rank 45
successfully loaded 4 ZeRO state_dicts for rank 15
loading 4 zero partition checkpoints for rank 23
successfully loaded 4 ZeRO state_dicts for rank 7
successfully loaded 4 ZeRO state_dicts for rank 30
successfully loaded 4 ZeRO state_dicts for rank 26
loading 4 zero partition checkpoints for rank 41
successfully loaded 4 ZeRO state_dicts for rank 13
successfully loaded 4 ZeRO state_dicts for rank 11
successfully loaded 4 ZeRO state_dicts for rank 5
successfully loaded 4 ZeRO state_dicts for rank 9
successfully loaded 4 ZeRO state_dicts for rank 62
successfully loaded 4 ZeRO state_dicts for rank 3
successfully loaded 4 ZeRO state_dicts for rank 59
successfully loaded 4 ZeRO state_dicts for rank 8
loading 4 zero partition checkpoints for rank 37
successfully loaded 4 ZeRO state_dicts for rank 54
successfully loaded 4 ZeRO state_dicts for rank 60
successfully loaded 4 ZeRO state_dicts for rank 12
loading 4 zero partition checkpoints for rank 43
loading 4 zero partition checkpoints for rank 19
loading 4 zero partition checkpoints for rank 39
successfully loaded 4 ZeRO state_dicts for rank 63
loading 4 zero partition checkpoints for rank 42
successfully loaded 4 ZeRO state_dicts for rank 49
loading 4 zero partition checkpoints for rank 33
loading 4 zero partition checkpoints for rank 27
successfully loaded 4 ZeRO state_dicts for rank 53
loading 4 zero partition checkpoints for rank 46
successfully loaded 4 ZeRO state_dicts for rank 57
loading 4 zero partition checkpoints for rank 34
successfully loaded 4 ZeRO state_dicts for rank 58
loading 4 zero partition checkpoints for rank 35
successfully loaded 4 ZeRO state_dicts for rank 61
successfully loaded 4 ZeRO state_dicts for rank 50
loading 4 zero partition checkpoints for rank 40
loading 4 zero partition checkpoints for rank 32
successfully loaded 4 ZeRO state_dicts for rank 55
loading 4 zero partition checkpoints for rank 47
loading 4 zero partition checkpoints for rank 44
loading 4 zero partition checkpoints for rank 21
successfully loaded 4 ZeRO state_dicts for rank 51
loading 4 zero partition checkpoints for rank 36
loading 4 zero partition checkpoints for rank 38
loading 4 zero partition checkpoints for rank 31
loading 4 zero partition checkpoints for rank 29
loading 4 zero partition checkpoints for rank 24
loading 4 zero partition checkpoints for rank 25
loading 4 zero partition checkpoints for rank 28
loading 4 zero partition checkpoints for rank 17
loading 4 zero partition checkpoints for rank 30
loading 4 zero partition checkpoints for rank 26
loading 4 zero partition checkpoints for rank 6
loading 4 zero partition checkpoints for rank 48
loading 4 zero partition checkpoints for rank 10
loading 4 zero partition checkpoints for rank 0
 checkpoint version 3.0
loading 4 zero partition checkpoints for rank 14
loading 4 zero partition checkpoints for rank 56
loading 4 zero partition checkpoints for rank 52
loading 4 zero partition checkpoints for rank 2
loading 4 zero partition checkpoints for rank 1
loading 4 zero partition checkpoints for rank 11
loading 4 zero partition checkpoints for rank 4
loading 4 zero partition checkpoints for rank 5
loading 4 zero partition checkpoints for rank 3
loading 4 zero partition checkpoints for rank 15
loading 4 zero partition checkpoints for rank 7
loading 4 zero partition checkpoints for rank 9
loading 4 zero partition checkpoints for rank 13
loading 4 zero partition checkpoints for rank 62
loading 4 zero partition checkpoints for rank 8
loading 4 zero partition checkpoints for rank 54
loading 4 zero partition checkpoints for rank 12
loading 4 zero partition checkpoints for rank 60
loading 4 zero partition checkpoints for rank 59
loading 4 zero partition checkpoints for rank 63
loading 4 zero partition checkpoints for rank 53
loading 4 zero partition checkpoints for rank 57
loading 4 zero partition checkpoints for rank 49
loading 4 zero partition checkpoints for rank 50
loading 4 zero partition checkpoints for rank 61
loading 4 zero partition checkpoints for rank 58
loading 4 zero partition checkpoints for rank 51
loading 4 zero partition checkpoints for rank 55
  successfully loaded checkpoint from /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints at iteration 111261
time (ms) | load-checkpoint: 2110.65
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.62471936
estimated model parameters: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.62471936
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
estimated model parameters: 1.209483264
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.62471936
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.62471936
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
estimated model parameters: 1.62471936estimated model parameters: 1.62471936

estimated model parameters: 1.624784896estimated model parameters: 1.624784896

estimated model parameters: 1.62471936
estimated model parameters: 1.209483264
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896estimated model parameters: 1.624784896estimated model parameters: 1.624784896


estimated model parameters: 1.624784896
estimated model parameters: 1.62471936
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264estimated model parameters: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")

/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.624784896
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.624784896
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.62471936estimated model parameters: 1.62471936

/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
estimated model parameters: 1.62471936
estimated model parameters: 1.624784896estimated model parameters: 1.624784896

estimated model parameters: 1.62471936
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264estimated model parameters: 1.209483264

estimated model parameters: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488

estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters: 1.62471936
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.209483264estimated model parameters: 1.209483264
estimated model parameters: 1.209483264

estimated model parameters: 1.209483264
estimated model parameters: 1.62471936
estimated model parameters: 1.624784896estimated model parameters: 1.624784896

estimated model parameters: 1.624784896
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.62471936
estimated model parameters without embeddings: 1.2095488
estimated model parameters: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.624784896
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.62471936
estimated model parameters without embeddings: 1.2095488
estimated model parameters: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488

estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488

estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
[after model, optimizer, and learning rate scheduler are built] datetime: 2021-10-05 14:10:37 
> building train, validation, and test datasets ...
 > datasets target sizes (minimum size):
    train:      73242187
    validation: 7833600
    test:       51200
> building train, validation, and test datasets for GPT ...
 > building dataset index ...
    reading sizes...
    reading pointers...
    reading document index...
    creating numpy buffer of mmap...
    creating memory view of numpy buffer...
 > finished creating indexed dataset in 0.183901 seconds
    number of documents: 304230423
 > dataset split:
    train:
     document indices in [0, 288714672) total of 288714672 documents
    validation:
     document indices in [288714672, 303926193) total of 15211521 documents
    test:
     document indices in [303926193, 304230423) total of 304230 documents
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.212 seconds
    total number of samples: 131537224
    total number of epochs: 1
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.224 seconds
    total number of samples: 13854322
    total number of epochs: 2
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.073 seconds
    total number of samples: 137384
    total number of epochs: 1
> finished creating GPT datasets ...
[after dataloaders are built] datetime: 2021-10-05 14:10:43 
done with setup ...
training ...
time (ms) | model-and-optimizer-setup: 3832.94 | train/valid/test-data-iterators-setup: 5366.36
Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion

Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion


Number of parameters: 1.62471936 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.62471936 billionNumber of parameters: 1.62471936 billion

Number of parameters: 1.62471936 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion

Number of parameters: 1.62471936 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.624784896 billionNumber of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion

Number of parameters: 1.624784896 billion
Number of parameters: 1.62471936 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion

Number of parameters without embeddings: 1.209483264 billion

Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billionNumber of parameters without embeddings: 1.2095488 billion

Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion

Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters: 1.624784896 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
[before the start of training step] datetime: 2021-10-05 14:10:43 
[2021-10-05 14:10:43,934] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information
[2021-10-05 14:10:43,934] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False
[2021-10-05 14:10:43,934] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 24 total layers
[2021-10-05 14:10:43,934] [INFO] [checkpointing.py:415:forward] ----Synchronization False
[2021-10-05 14:10:43,934] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False
[Rank 49] (after 111400 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 6758.0 | max reserved: 6758.0
[Rank 51] (after 111400 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 6710.0 | max reserved: 6710.0
 iteration   111400/  152972 | consumed samples:     51957184 | elapsed time per iteration (ms): 6185.6 | learning rate: 4.747E-05 | global batch size:   512 | lm loss: 2.773749E+00 | loss scale: 2097152.0 | grad norm: 181915.819 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
[Rank 35] (after 111400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4364.0 | max reserved: 4364.0
[Rank 19] (after 111400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4780.0 | max reserved: 4780.0
[Rank 33] (after 111400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4348.0 | max reserved: 4348.0
[Rank 17] (after 111400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4620.0 | max reserved: 4620.0
[Rank 1] (after 111400 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.45751953125 | reserved: 5494.0 | max reserved: 5494.0
[Rank 3] (after 111400 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.95751953125 | reserved: 5510.0 | max reserved: 5510.0
[Rank 34] (after 111400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4364.0 | max reserved: 4364.0
[Rank 2] (after 111400 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.95751953125 | reserved: 5414.0 | max reserved: 5414.0
[Rank 18] (after 111400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4620.0 | max reserved: 4620.0
[Rank 50] (after 111400 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7326.0 | max reserved: 7326.0
[Rank 16] (after 111400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4732.0 | max reserved: 4732.0
[Rank 32] (after 111400 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4380.0 | max reserved: 4380.0
[Rank 0] (after 111400 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.95751953125 | reserved: 5510.0 | max reserved: 5510.0
[Rank 48] (after 111400 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7326.0 | max reserved: 7326.0
time (ms)
 iteration   111600/  152972 | consumed samples:     52059584 | elapsed time per iteration (ms): 6030.8 | learning rate: 4.714E-05 | global batch size:   512 | lm loss: 2.770647E+00 | loss scale: 2097152.0 | grad norm: 192287.974 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   111800/  152972 | consumed samples:     52161984 | elapsed time per iteration (ms): 5956.7 | learning rate: 4.681E-05 | global batch size:   512 | lm loss: 2.773291E+00 | loss scale: 524288.0 | grad norm: 51923.288 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-10-05 15:24:50,636] [INFO] [logging.py:68:log_dist] [Rank 0] step=112000, skipped=247, lr=[4.6477573812924025e-05, 4.6477573812924025e-05], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 112000 loss: 2.7388 iter time (s): 0.003 samples/sec: 172505.592
 iteration   112000/  152972 | consumed samples:     52264384 | elapsed time per iteration (ms): 5948.4 | learning rate: 4.648E-05 | global batch size:   512 | lm loss: 2.772834E+00 | loss scale: 524288.0 | grad norm: 49146.092 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
--------------------------------------------------------------------------------------------------
 validation loss at iteration 112000 | lm loss value: 2.724427E+00 | lm loss PPL: 1.524768E+01 | 
--------------------------------------------------------------------------------------------------
 iteration   112200/  152972 | consumed samples:     52366784 | elapsed time per iteration (ms): 6823.3 | learning rate: 4.615E-05 | global batch size:   512 | lm loss: 2.773536E+00 | loss scale: 524288.0 | grad norm: 48247.836 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   112400/  152972 | consumed samples:     52469184 | elapsed time per iteration (ms): 5990.0 | learning rate: 4.582E-05 | global batch size:   512 | lm loss: 2.773350E+00 | loss scale: 1048576.0 | grad norm: 94660.710 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration  112500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-05 16:17:33,163] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step112500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration  112500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 2848.78
 iteration   112600/  152972 | consumed samples:     52571584 | elapsed time per iteration (ms): 6004.0 | learning rate: 4.550E-05 | global batch size:   512 | lm loss: 2.774128E+00 | loss scale: 1048576.0 | grad norm: 102474.964 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   112800/  152972 | consumed samples:     52673984 | elapsed time per iteration (ms): 5980.7 | learning rate: 4.517E-05 | global batch size:   512 | lm loss: 2.771654E+00 | loss scale: 524288.0 | grad norm: 49631.817 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   113000/  152972 | consumed samples:     52776384 | elapsed time per iteration (ms): 6008.7 | learning rate: 4.485E-05 | global batch size:   512 | lm loss: 2.773222E+00 | loss scale: 262144.0 | grad norm: 24120.141 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
--------------------------------------------------------------------------------------------------
 validation loss at iteration 113000 | lm loss value: 2.720957E+00 | lm loss PPL: 1.519485E+01 | 
--------------------------------------------------------------------------------------------------
 iteration   113200/  152972 | consumed samples:     52878784 | elapsed time per iteration (ms): 6862.0 | learning rate: 4.453E-05 | global batch size:   512 | lm loss: 2.776423E+00 | loss scale: 262144.0 | grad norm: 25125.654 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   113400/  152972 | consumed samples:     52981184 | elapsed time per iteration (ms): 6026.4 | learning rate: 4.420E-05 | global batch size:   512 | lm loss: 2.776411E+00 | loss scale: 262144.0 | grad norm: 25757.039 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   113600/  152972 | consumed samples:     53083584 | elapsed time per iteration (ms): 5976.5 | learning rate: 4.389E-05 | global batch size:   512 | lm loss: 2.777685E+00 | loss scale: 262144.0 | grad norm: 24015.053 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   113800/  152972 | consumed samples:     53185984 | elapsed time per iteration (ms): 5977.4 | learning rate: 4.357E-05 | global batch size:   512 | lm loss: 2.777700E+00 | loss scale: 262144.0 | grad norm: 25625.697 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-10-05 18:50:12,944] [INFO] [logging.py:68:log_dist] [Rank 0] step=114000, skipped=252, lr=[4.324816525536577e-05, 4.324816525536577e-05], mom=[(0.9, 0.999), (0.9, 0.999)]
 iteration   114000/  152972 | consumed samples:     53288384 | elapsed time per iteration (ms): 5962.6 | learning rate: 4.325E-05 | global batch size:   512 | lm loss: 2.773806E+00 | loss scale: 262144.0 | grad norm: 28198.409 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
steps: 114000 loss: 2.7807 iter time (s): 0.003 samples/sec: 171393.162
--------------------------------------------------------------------------------------------------
 validation loss at iteration 114000 | lm loss value: 2.725353E+00 | lm loss PPL: 1.526180E+01 | 
--------------------------------------------------------------------------------------------------
saving checkpoint at iteration  114000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-05 18:53:12,401] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step114000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration  114000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 2408.44
 iteration   114200/  152972 | consumed samples:     53390784 | elapsed time per iteration (ms): 6874.7 | learning rate: 4.293E-05 | global batch size:   512 | lm loss: 2.774811E+00 | loss scale: 524288.0 | grad norm: 48233.575 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   114400/  152972 | consumed samples:     53493184 | elapsed time per iteration (ms): 5988.2 | learning rate: 4.261E-05 | global batch size:   512 | lm loss: 2.773128E+00 | loss scale: 524288.0 | grad norm: 50387.815 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   114600/  152972 | consumed samples:     53595584 | elapsed time per iteration (ms): 5975.6 | learning rate: 4.230E-05 | global batch size:   512 | lm loss: 2.776219E+00 | loss scale: 1048576.0 | grad norm: 111799.492 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   114800/  152972 | consumed samples:     53697984 | elapsed time per iteration (ms): 5970.6 | learning rate: 4.199E-05 | global batch size:   512 | lm loss: 2.776233E+00 | loss scale: 1048576.0 | grad norm: 104622.267 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   115000/  152972 | consumed samples:     53800384 | elapsed time per iteration (ms): 5953.3 | learning rate: 4.168E-05 | global batch size:   512 | lm loss: 2.773017E+00 | loss scale: 524288.0 | grad norm: 52131.867 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
--------------------------------------------------------------------------------------------------
 validation loss at iteration 115000 | lm loss value: 2.723079E+00 | lm loss PPL: 1.522713E+01 | 
--------------------------------------------------------------------------------------------------
 iteration   115200/  152972 | consumed samples:     53902784 | elapsed time per iteration (ms): 6850.6 | learning rate: 4.137E-05 | global batch size:   512 | lm loss: 2.771291E+00 | loss scale: 524288.0 | grad norm: 52286.430 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   115400/  152972 | consumed samples:     54005184 | elapsed time per iteration (ms): 5969.4 | learning rate: 4.106E-05 | global batch size:   512 | lm loss: 2.772654E+00 | loss scale: 524288.0 | grad norm: 53750.168 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration  115500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-05 21:25:27,412] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step115500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration  115500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 2275.69
 iteration   115600/  152972 | consumed samples:     54107584 | elapsed time per iteration (ms): 5975.9 | learning rate: 4.075E-05 | global batch size:   512 | lm loss: 2.773282E+00 | loss scale: 262144.0 | grad norm: 25349.999 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   115800/  152972 | consumed samples:     54209984 | elapsed time per iteration (ms): 5965.3 | learning rate: 4.044E-05 | global batch size:   512 | lm loss: 2.775505E+00 | loss scale: 262144.0 | grad norm: 25149.356 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-10-05 22:15:11,785] [INFO] [logging.py:68:log_dist] [Rank 0] step=116000, skipped=256, lr=[4.013634096435418e-05, 4.013634096435418e-05], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 116000 loss: 2.7789 iter time (s): 0.003 samples/sec: 171690.472
 iteration   116000/  152972 | consumed samples:     54312384 | elapsed time per iteration (ms): 5970.6 | learning rate: 4.014E-05 | global batch size:   512 | lm loss: 2.770437E+00 | loss scale: 524288.0 | grad norm: 50651.382 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
--------------------------------------------------------------------------------------------------
 validation loss at iteration 116000 | lm loss value: 2.718214E+00 | lm loss PPL: 1.515323E+01 | 
--------------------------------------------------------------------------------------------------
 iteration   116200/  152972 | consumed samples:     54414784 | elapsed time per iteration (ms): 6847.5 | learning rate: 3.983E-05 | global batch size:   512 | lm loss: 2.775285E+00 | loss scale: 262144.0 | grad norm: 25363.992 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   116400/  152972 | consumed samples:     54517184 | elapsed time per iteration (ms): 5963.2 | learning rate: 3.953E-05 | global batch size:   512 | lm loss: 2.772805E+00 | loss scale: 131072.0 | grad norm: 13767.894 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   116600/  152972 | consumed samples:     54619584 | elapsed time per iteration (ms): 5969.6 | learning rate: 3.923E-05 | global batch size:   512 | lm loss: 2.772243E+00 | loss scale: 131072.0 | grad norm: 13061.378 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   116800/  152972 | consumed samples:     54721984 | elapsed time per iteration (ms): 5958.9 | learning rate: 3.893E-05 | global batch size:   512 | lm loss: 2.771802E+00 | loss scale: 262144.0 | grad norm: 27099.747 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   117000/  152972 | consumed samples:     54824384 | elapsed time per iteration (ms): 5961.7 | learning rate: 3.863E-05 | global batch size:   512 | lm loss: 2.773109E+00 | loss scale: 262144.0 | grad norm: 29962.289 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
--------------------------------------------------------------------------------------------------
 validation loss at iteration 117000 | lm loss value: 2.720808E+00 | lm loss PPL: 1.519259E+01 | 
--------------------------------------------------------------------------------------------------
saving checkpoint at iteration  117000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-06 00:00:27,569] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step117000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration  117000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 2172.02
 iteration   117200/  152972 | consumed samples:     54926784 | elapsed time per iteration (ms): 6845.5 | learning rate: 3.833E-05 | global batch size:   512 | lm loss: 2.773669E+00 | loss scale: 262144.0 | grad norm: 24896.734 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   117400/  152972 | consumed samples:     55029184 | elapsed time per iteration (ms): 5960.3 | learning rate: 3.803E-05 | global batch size:   512 | lm loss: 2.769607E+00 | loss scale: 524288.0 | grad norm: 51939.851 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   117600/  152972 | consumed samples:     55131584 | elapsed time per iteration (ms): 5971.4 | learning rate: 3.774E-05 | global batch size:   512 | lm loss: 2.769320E+00 | loss scale: 524288.0 | grad norm: 50725.679 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   117800/  152972 | consumed samples:     55233984 | elapsed time per iteration (ms): 5977.5 | learning rate: 3.744E-05 | global batch size:   512 | lm loss: 2.772576E+00 | loss scale: 524288.0 | grad norm: 52865.070 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-10-06 01:39:55,142] [INFO] [logging.py:68:log_dist] [Rank 0] step=118000, skipped=259, lr=[3.714829298594639e-05, 3.714829298594639e-05], mom=[(0.9, 0.999), (0.9, 0.999)]
 iteration   118000/  152972 | consumed samples:     55336384 | elapsed time per iteration (ms): 5961.3 | learning rate: 3.715E-05 | global batch size:   512 | lm loss: 2.767223E+00 | loss scale: 524288.0 | grad norm: 51697.971 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
steps: 118000 loss: 2.7598 iter time (s): 0.003 samples/sec: 171611.772
--------------------------------------------------------------------------------------------------
 validation loss at iteration 118000 | lm loss value: 2.717343E+00 | lm loss PPL: 1.514005E+01 | 
--------------------------------------------------------------------------------------------------
 iteration   118200/  152972 | consumed samples:     55438784 | elapsed time per iteration (ms): 6854.4 | learning rate: 3.686E-05 | global batch size:   512 | lm loss: 2.771942E+00 | loss scale: 1048576.0 | grad norm: 101223.616 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   118400/  152972 | consumed samples:     55541184 | elapsed time per iteration (ms): 5977.2 | learning rate: 3.657E-05 | global batch size:   512 | lm loss: 2.770937E+00 | loss scale: 1048576.0 | grad norm: 97509.360 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration  118500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-06 02:32:38,444] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step118500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration  118500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1935.83
 iteration   118600/  152972 | consumed samples:     55643584 | elapsed time per iteration (ms): 5981.7 | learning rate: 3.628E-05 | global batch size:   512 | lm loss: 2.768575E+00 | loss scale: 1048576.0 | grad norm: 103656.727 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   118800/  152972 | consumed samples:     55745984 | elapsed time per iteration (ms): 5985.8 | learning rate: 3.599E-05 | global batch size:   512 | lm loss: 2.767637E+00 | loss scale: 1048576.0 | grad norm: 110427.455 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   119000/  152972 | consumed samples:     55848384 | elapsed time per iteration (ms): 5973.1 | learning rate: 3.570E-05 | global batch size:   512 | lm loss: 2.766892E+00 | loss scale: 2097152.0 | grad norm: 205401.134 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
--------------------------------------------------------------------------------------------------
 validation loss at iteration 119000 | lm loss value: 2.720641E+00 | lm loss PPL: 1.519005E+01 | 
--------------------------------------------------------------------------------------------------
 iteration   119200/  152972 | consumed samples:     55950784 | elapsed time per iteration (ms): 6854.4 | learning rate: 3.542E-05 | global batch size:   512 | lm loss: 2.768917E+00 | loss scale: 1048576.0 | grad norm: 114130.063 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   119400/  152972 | consumed samples:     56053184 | elapsed time per iteration (ms): 5980.2 | learning rate: 3.514E-05 | global batch size:   512 | lm loss: 2.766753E+00 | loss scale: 524288.0 | grad norm: 49946.872 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   119600/  152972 | consumed samples:     56155584 | elapsed time per iteration (ms): 5983.7 | learning rate: 3.485E-05 | global batch size:   512 | lm loss: 2.768507E+00 | loss scale: 524288.0 | grad norm: 48961.269 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   119800/  152972 | consumed samples:     56257984 | elapsed time per iteration (ms): 5982.7 | learning rate: 3.457E-05 | global batch size:   512 | lm loss: 2.767091E+00 | loss scale: 1048576.0 | grad norm: 99700.110 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-10-06 05:05:06,967] [INFO] [logging.py:68:log_dist] [Rank 0] step=120000, skipped=265, lr=[3.429557656883248e-05, 3.429557656883248e-05], mom=[(0.9, 0.999), (0.9, 0.999)]
 iteration   120000/  152972 | consumed samples:     56360384 | elapsed time per iteration (ms): 5985.9 | learning rate: 3.430E-05 | global batch size:   512 | lm loss: 2.768388E+00 | loss scale: 524288.0 | grad norm: 54959.366 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
steps: 120000 loss: 2.7499 iter time (s): 0.003 samples/sec: 171229.641
--------------------------------------------------------------------------------------------------
 validation loss at iteration 120000 | lm loss value: 2.714664E+00 | lm loss PPL: 1.509954E+01 | 
--------------------------------------------------------------------------------------------------
saving checkpoint at iteration  120000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-06 05:08:02,812] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step120000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration  120000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1629.89
 iteration   120200/  152972 | consumed samples:     56462784 | elapsed time per iteration (ms): 6850.0 | learning rate: 3.402E-05 | global batch size:   512 | lm loss: 2.769492E+00 | loss scale: 524288.0 | grad norm: 51441.181 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   120400/  152972 | consumed samples:     56565184 | elapsed time per iteration (ms): 5965.5 | learning rate: 3.374E-05 | global batch size:   512 | lm loss: 2.767628E+00 | loss scale: 524288.0 | grad norm: 49148.753 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   120600/  152972 | consumed samples:     56667584 | elapsed time per iteration (ms): 5958.6 | learning rate: 3.346E-05 | global batch size:   512 | lm loss: 2.764325E+00 | loss scale: 1048576.0 | grad norm: 100152.980 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   120800/  152972 | consumed samples:     56769984 | elapsed time per iteration (ms): 5959.9 | learning rate: 3.319E-05 | global batch size:   512 | lm loss: 2.763713E+00 | loss scale: 1048576.0 | grad norm: 99822.539 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   121000/  152972 | consumed samples:     56872384 | elapsed time per iteration (ms): 5954.0 | learning rate: 3.292E-05 | global batch size:   512 | lm loss: 2.767021E+00 | loss scale: 1048576.0 | grad norm: 99876.373 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
--------------------------------------------------------------------------------------------------
 validation loss at iteration 121000 | lm loss value: 2.713844E+00 | lm loss PPL: 1.508716E+01 | 
--------------------------------------------------------------------------------------------------
 iteration   121200/  152972 | consumed samples:     56974784 | elapsed time per iteration (ms): 6864.3 | learning rate: 3.265E-05 | global batch size:   512 | lm loss: 2.765081E+00 | loss scale: 2097152.0 | grad norm: 203364.746 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   121400/  152972 | consumed samples:     57077184 | elapsed time per iteration (ms): 5984.9 | learning rate: 3.238E-05 | global batch size:   512 | lm loss: 2.765125E+00 | loss scale: 1048576.0 | grad norm: 102147.801 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration  121500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-06 07:40:13,298] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step121500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration  121500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1555.19
 iteration   121600/  152972 | consumed samples:     57179584 | elapsed time per iteration (ms): 5995.1 | learning rate: 3.211E-05 | global batch size:   512 | lm loss: 2.763338E+00 | loss scale: 1048576.0 | grad norm: 102610.882 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   121800/  152972 | consumed samples:     57281984 | elapsed time per iteration (ms): 5978.3 | learning rate: 3.184E-05 | global batch size:   512 | lm loss: 2.766241E+00 | loss scale: 1048576.0 | grad norm: 96556.185 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-10-06 08:30:04,092] [INFO] [logging.py:68:log_dist] [Rank 0] step=122000, skipped=270, lr=[3.157777721059308e-05, 3.157777721059308e-05], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 122000 loss: 2.7637 iter time (s): 0.003 samples/sec: 171643.766
 iteration   122000/  152972 | consumed samples:     57384384 | elapsed time per iteration (ms): 5974.9 | learning rate: 3.158E-05 | global batch size:   512 | lm loss: 2.763846E+00 | loss scale: 1048576.0 | grad norm: 160880.689 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
--------------------------------------------------------------------------------------------------
 validation loss at iteration 122000 | lm loss value: 2.711936E+00 | lm loss PPL: 1.505840E+01 | 
--------------------------------------------------------------------------------------------------
 iteration   122200/  152972 | consumed samples:     57486784 | elapsed time per iteration (ms): 6847.8 | learning rate: 3.131E-05 | global batch size:   512 | lm loss: 2.764269E+00 | loss scale: 1048576.0 | grad norm: 99689.303 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   122400/  152972 | consumed samples:     57589184 | elapsed time per iteration (ms): 5972.8 | learning rate: 3.105E-05 | global batch size:   512 | lm loss: 2.765003E+00 | loss scale: 524288.0 | grad norm: 51013.224 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   122600/  152972 | consumed samples:     57691584 | elapsed time per iteration (ms): 5969.6 | learning rate: 3.079E-05 | global batch size:   512 | lm loss: 2.764298E+00 | loss scale: 524288.0 | grad norm: 52170.025 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   122800/  152972 | consumed samples:     57793984 | elapsed time per iteration (ms): 5968.3 | learning rate: 3.053E-05 | global batch size:   512 | lm loss: 2.764158E+00 | loss scale: 1048576.0 | grad norm: 101626.325 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration  122871 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-06 09:59:41,714] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step122871/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration  122871 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1583.97
[exiting program after 1190.0879358172417 minutes] datetime: 2021-10-06 09:59:42 
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
2021-10-06 10:00:41.383371: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.383364: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.383371: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.383369: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.437417: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.437418: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.437425: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.437429: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.449396: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.449394: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.449401: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.449397: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.482242: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.482245: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.482256: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.482253: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.482513: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.482521: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.482530: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.482525: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.492468: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.492458: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.492461: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.492472: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.511208: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.511210: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.511202: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.511203: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.516723: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.516733: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.516725: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.516730: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.516851: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.516857: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.516859: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.516862: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.524203: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.524208: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.524215: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.524218: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.541391: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.541390: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.541386: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.541396: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.542063: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.542065: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.542065: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.542075: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.550492: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.550501: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.550498: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.550492: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.569949: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.569951: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.569956: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.569956: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.575917: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.575922: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.575912: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:41.575927: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:42.187121: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:42.187124: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:42.187130: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-06 10:00:42.187127: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------

--------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------JIT compiled ops requires ninja--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninja
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja


DeepSpeed C++/CUDA extension op report--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------JIT compiled ops requires ninja
JIT compiled ops requires ninja
--------------------------------------------------

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja   ......................................................    ..................[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0m
----------------------------------------------------------------------------------------------------


----------------------------------------------------------------------------------------------------op nameop name

  ................................op nameop name    installed................ installed................ installed  .. ..installed ..  compatible compatiblecompatible
..
 
----------------------------------------------------------------------------------------------------compatible

--------------------------------------------------

--------------------------------------------------
ninjaninjaninja  .................. ninja.................. .................. [92m[OKAY][0m  ..................[92m[OKAY][0m
[92m[OKAY][0m 

[92m[OKAY][0m----------------------------------------------------------------------------------------------------

--------------------------------------------------
op name
--------------------------------------------------op name 
cpu_adamcpu_adamcpu_adam  cpu_adam............... ...............  ............... ...............[92m[YES][0m [92m[YES][0m  [92m[YES][0m [92m[YES][0m......  ...... ...... [92m[OKAY][0m...... [92m[OKAY][0m
 [92m[OKAY][0m
[92m[OKAY][0m

 op name................op name................    ................installed................  installed ..installed  installed ..compatible.. 
  compatible..--------------------------------------------------compatible 

compatible--------------------------------------------------
--------------------------------------------------


--------------------------------------------------
fused_adam fused_adam.............fused_adamfused_adam    ............. [93m[NO][0m............. ....................[93m[NO][0m   [92m[OKAY][0m [93m[NO][0m.......[93m[NO][0m
cpu_adam ............... [92m[YES][0mcpu_adamcpu_adam  cpu_adam............... ......   ...............[92m[OKAY][0m [92m[YES][0m...............[92m[YES][0m
   [92m[OKAY][0m..............
fused_lamb   [92m[OKAY][0m[92m[OKAY][0m.............
fused_lamb
   ............[92m[YES][0m  [92m[OKAY][0m[92m[OKAY][0m 

......fused_adam  [92m[OKAY][0m.............
  .............fused_lamb [93m[NO][0mfused_lamb [93m[NO][0m  .................... ............. .......   [92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m[93m[NO][0m

 [93m[NO][0m ....... [92m[OKAY][0m
  ..............  [92m[OKAY][0m[92m[OKAY][0m

fused_adam fused_adam.............  .............fused_adamfused_lamb[93m[NO][0m    [93m[NO][0m ........................................    [93m[NO][0m[92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m
 
 .............. fused_lamb fused_lamb[92m[OKAY][0m [92m[OKAY][0m 
.......................... 
[93m[NO][0m fused_lamb [93m[NO][0m.......   ....................[92m[OKAY][0m 
[92m[OKAY][0m 
sparse_attnsparse_attn  ............sparse_attn ............sparse_attn  [93m[NO][0m[93m[NO][0m  ............ ................... .......  [93m[NO][0m[93m[NO][0m  [92m[OKAY][0m [92m[OKAY][0m
.......
[93m[NO][0m ....... sparse_attn[92m[OKAY][0m 
............ [93m[NO][0m ....... [92m[OKAY][0m
....... transformer [92m[OKAY][0mtransformer [92m[OKAY][0m
 ............
transformer ............ [93m[NO][0mtransformer ............ [93m[NO][0m   .......................... [93m[NO][0m  [93m[NO][0m[92m[OKAY][0m [92m[OKAY][0m .......
.......
sparse_attn ............transformersparse_attn  [93m[NO][0msparse_attn ........................  .......  [93m[NO][0m ............[93m[NO][0m [92m[OKAY][0m  [93m[NO][0m.......
 ....... ....... transformer[92m[OKAY][0m [92m[OKAY][0m 
[92m[OKAY][0m............

  [92m[OKAY][0m[92m[OKAY][0m
stochastic_transformer
 stochastic_transformer.stochastic_transformer  stochastic_transformer . .[93m[NO][0m.    [93m[NO][0m[93m[NO][0m[93m[NO][0m.......  .......  ....... .......[92m[OKAY][0m
[92m[OKAY][0m 
 [92m[OKAY][0m[92m[OKAY][0m

 [93m[NO][0m stochastic_transformertransformer....... transformer   ............[92m[OKAY][0m.............
   [93m[NO][0m[93m[NO][0m[93m[NO][0m stochastic_transformer  ....... ........  .......  [93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m 


....... [92m[OKAY][0mstochastic_transformer
 . stochastic_transformer[93m[NO][0m  ........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io transformer_inference...............  ..[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[92m[OKAY][0m
utils .................. [92m[YES][0m ...... transformer_inference[92m[OKAY][0m 
.. [93m[NO][0m quantizer.......  ..............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
utils-------------------------------------------------- 
.................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m .......async_io  [93m[NO][0m...............
 [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m .......utils  [92m[OKAY][0m..................
 [92m[YES][0m ...... [92m[OKAY][0m
utils quantizer..................  ..............[92m[YES][0m  [93m[NO][0m......  .......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer-------------------------------------------------- 
.............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. [93m[NO][0m .......utils  [92m[OKAY][0m..................
 [92m[YES][0m ...... [92m[OKAY][0mutils
 .................. quantizer[92m[YES][0m  ....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
quantizer ..............-------------------------------------------------- 
[93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------DeepSpeed C++/CUDA extension op report

JIT compiled ops requires ninja--------------------------------------------------
--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


DeepSpeed C++/CUDA extension op report--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------JIT compiled ops requires ninja


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja--------------------------------------------------

JIT compiled ops requires ninja
ninjaninjaninjaninja   .................. .................................... ..................  [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


op nameop nameop name  op name ................  ................................installed ................ installed  .. installedinstalled   ..compatible....  
 compatiblecompatible--------------------------------------------------compatible


--------------------------------------------------
--------------------------------------------------

--------------------------------------------------
cpu_adam ............... [92m[YES][0m cpu_adamcpu_adamcpu_adam......    .............................................[92m[OKAY][0m   
[92m[YES][0m[92m[YES][0m[92m[YES][0m   ..................  [92m[OKAY][0m[92m[OKAY][0m 

[92m[OKAY][0mfused_adam
 ............. [93m[NO][0m .......fused_adam  [92m[OKAY][0mfused_adam.............fused_adam
   [93m[NO][0m..........................fused_lamb    .......[93m[NO][0m[93m[NO][0m.............  [92m[OKAY][0m  [93m[NO][0m....... 
....... ....... [92m[OKAY][0mfused_lamb [92m[OKAY][0m [92m[OKAY][0m
.............

 fused_lamb[93m[NO][0m fused_lamb ............. ....................   [93m[NO][0m[92m[OKAY][0m[93m[NO][0msparse_attn
   ..........................   [92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m

 ....... [92m[OKAY][0msparse_attn
 ............ transformer[93m[NO][0m  ............ .......sparse_attnsparse_attn [93m[NO][0m ............  ............  [92m[OKAY][0m.......[93m[NO][0m[93m[NO][0m
   .......[92m[OKAY][0m.......transformer 
  [92m[OKAY][0m[92m[OKAY][0m............

 stochastic_transformer[93m[NO][0m transformer.  transformer............ [93m[NO][0m  ....... [93m[NO][0m............ .......  [92m[OKAY][0m [93m[NO][0m[92m[OKAY][0m....... 

 .......[92m[OKAY][0m 
stochastic_transformer[92m[OKAY][0m 
stochastic_transformer.  stochastic_transformer[93m[NO][0m.   ........[93m[NO][0m   [93m[NO][0m[92m[OKAY][0m....... 
 .......[92m[OKAY][0m 
[92m[OKAY][0m
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------


JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja  ....................................    ..................[92m[OKAY][0m..................[92m[OKAY][0m 
 
[92m[OKAY][0m--------------------------------------------------[92m[OKAY][0m
--------------------------------------------------


--------------------------------------------------op name--------------------------------------------------op name
 
 ................op nameop name................  ................  installed  ................installedinstalled..    ..compatibleinstalled..  
 compatiblecompatible--------------------------------------------------..


 --------------------------------------------------
compatible--------------------------------------------------

--------------------------------------------------
cpu_adam ............... [92m[YES][0mcpu_adam  ......cpu_adamcpu_adam...............  [92m[OKAY][0m  
..............................[92m[YES][0m  [92m[YES][0m [92m[YES][0m......   ............ [92m[OKAY][0m[92m[OKAY][0m
fused_adam 
 [92m[OKAY][0m.............
 [93m[NO][0m ....... [92m[OKAY][0mfused_adam
 fused_adam.............  fused_lamb[93m[NO][0m.............fused_adam   ............. .......[93m[NO][0m .............  [93m[NO][0m [92m[OKAY][0m .......[93m[NO][0m
.......   fused_lamb[92m[OKAY][0m .......[92m[OKAY][0m
............. 
 [92m[OKAY][0m[93m[NO][0mfused_lamb 
 .................... fused_lamb [92m[OKAY][0m [93m[NO][0m
.............sparse_attn   ...................[93m[NO][0m   [92m[OKAY][0m[93m[NO][0m.......
  .......sparse_attn[92m[OKAY][0m  [92m[OKAY][0m............

 [93m[NO][0m .......transformer  [92m[OKAY][0m............
 sparse_attn[93m[NO][0m transformer ............ ...................  sparse_attn [92m[OKAY][0m[93m[NO][0m  [93m[NO][0m
...................  ....... [92m[OKAY][0mstochastic_transformer [93m[NO][0m
 [92m[OKAY][0m 
........transformer   [92m[OKAY][0mstochastic_transformer[93m[NO][0m
............   ........[93m[NO][0m  transformer[92m[OKAY][0m  [93m[NO][0m
...................  .......[93m[NO][0m   [92m[OKAY][0m[92m[OKAY][0m
.......
 [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m stochastic_transformer.......  .[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_iotransformer_inference  .................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[92m[OKAY][0m

utils .................. [92m[YES][0mtransformer_inference  ........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m .......utils  [92m[OKAY][0m..................
 [92m[YES][0m-------------------------------------------------- 
...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------
DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report

DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op report
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
JIT compiled ops requires ninjaJIT compiled ops requires ninja
--------------------------------------------------


JIT compiled ops requires ninjaJIT compiled ops requires ninja

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja
--------------------------------------------------
JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m async_io.......  [93m[NO][0m...............
 [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0mtransformer_inference
 .. [93m[NO][0m ....... [92m[OKAY][0mutils
 .................. utils[92m[YES][0m  ........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
ninjaninjaninjaninja    ........................................................................   [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m

--------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------


op nameop nameop nameop name    ................................................................   installed installedinstalled  installed ....   ..compatible ..compatible
compatible 

--------------------------------------------------compatible--------------------------------------------------
--------------------------------------------------


--------------------------------------------------
cpu_adamcpu_adam  cpu_adamcpu_adam..............................    [92m[YES][0m...............[92m[YES][0m...............    ............[92m[YES][0m[92m[YES][0m    ......[92m[OKAY][0m[92m[OKAY][0m...... 
 
[92m[OKAY][0m[92m[OKAY][0m

fused_adamfused_adam fused_adam............. fused_adam  .............  [93m[NO][0m..........................[93m[NO][0m    [93m[NO][0m..............[93m[NO][0m    [92m[OKAY][0m[92m[OKAY][0m..............

  [92m[OKAY][0m[92m[OKAY][0mfused_lambfused_lamb

  ..........................  fused_lamb[93m[NO][0m[93m[NO][0mfused_lamb    ........................................    [93m[NO][0m[92m[OKAY][0m[93m[NO][0m[92m[OKAY][0m 

.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
sparse_attnsparse_attn ............ sparse_attnsparse_attn  ............ [93m[NO][0m............ ............  [93m[NO][0m [93m[NO][0m....... [93m[NO][0m   .....................[92m[OKAY][0m   
[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


transformer transformer............ transformertransformer............    ............[93m[NO][0m............ [93m[NO][0m  ....... [93m[NO][0m[93m[NO][0m .......   [92m[OKAY][0m..............[92m[OKAY][0m
  
[92m[OKAY][0m[92m[OKAY][0mstochastic_transformer

 stochastic_transformer.  stochastic_transformer.stochastic_transformer[93m[NO][0m    ..[93m[NO][0m.......    [93m[NO][0m[93m[NO][0m.......[92m[OKAY][0m   
..............[92m[OKAY][0m  
[92m[OKAY][0m[92m[OKAY][0m

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
op name
op nameop name  op name ................................................   installed ................ installedinstalled ..   installed..  ..compatible..compatible
 
compatible----------------------------------------------------------------------------------------------------


--------------------------------------------------
 compatible
cpu_adamcpu_adamcpu_adam   ...............--------------------------------------------------..............................
   [92m[YES][0m[92m[YES][0m[92m[YES][0m   ..................   [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


cpu_adamfused_adam fused_adamfused_adam.............    [93m[NO][0m............................ .............  [93m[NO][0m.......  [92m[YES][0m  [92m[OKAY][0m[93m[NO][0m.............
   [92m[OKAY][0m[92m[OKAY][0m.......
fused_lamb
  [92m[OKAY][0m.............fused_lamb 
 [93m[NO][0m............. fused_lamb ....... [93m[NO][0m fused_adam.............[92m[OKAY][0m   .......
 [92m[OKAY][0m
.............[93m[NO][0m  [93m[NO][0m ..............  [92m[OKAY][0m
sparse_attn[92m[OKAY][0msparse_attn 
 ............ ............[93m[NO][0m  fused_lambsparse_attn[93m[NO][0m.......  ............ [93m[NO][0m  ....... ....... .............[92m[OKAY][0m 
[92m[OKAY][0m[92m[OKAY][0m

 transformer[93m[NO][0m transformer transformer............   ........................[93m[NO][0m .......[93m[NO][0m    [92m[OKAY][0m.......[93m[NO][0m....... 
[92m[OKAY][0m  
[92m[OKAY][0m.......
 [92m[OKAY][0mstochastic_transformer
 stochastic_transformer.  .stochastic_transformer[93m[NO][0m   .[93m[NO][0m.......   [93m[NO][0m.......[92m[OKAY][0msparse_attn   .......[92m[OKAY][0m
 [92m[OKAY][0m
............
 [93m[NO][0m ....... [92m[OKAY][0m
transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ...............transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [93m[NO][0m.......
 [92m[OKAY][0m
utils .................. [92m[YES][0mtransformer_inference  ........  [92m[OKAY][0m[93m[NO][0m
 ....... quantizer[92m[OKAY][0m 
.............. [93m[NO][0m ....... utils[92m[OKAY][0m 
.................. [92m[YES][0m --------------------------------------------------......
 [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ................... ...................0.4.2+72ce55a, 72ce55a, big-science 
0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 .....................nvcc version  11.2.....................
 deepspeed install path11.2 
........... deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']...........
 deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
................... deepspeed info0.4.2+72ce55a, 72ce55a, big-science 
................... deepspeed wheel compiled w.0.4.2+72ce55a, 72ce55a, big-science 
...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 
...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ...................DeepSpeed general environment info: 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w.
 ...... torch 1.8, cuda 11.1
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
DeepSpeed general environment info:deepspeed install path ...........
 ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info torch install path...................  ...............0.4.2+72ce55a, 72ce55a, big-science 
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... transformer_inference[93m[NO][0m 
.. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
transformer_inferencequantizer  ................  [93m[NO][0m[93m[NO][0m  ....... .......[92m[OKAY][0m [92m[OKAY][0m
--------------------------------------------------

utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_ioasync_io transformer_inference ..............................   ..[93m[NO][0m[93m[NO][0m   [93m[NO][0m..............   .......[93m[NO][0m[93m[NO][0m 

[92m[OKAY][0m
DeepSpeed general environment info:
utils .................. transformer_inference[92m[YES][0mtransformer_inference   ..........   [93m[NO][0m[92m[OKAY][0m[93m[NO][0m 
DeepSpeed general environment info:torch install path 
 ..............  [92m[OKAY][0m[92m[OKAY][0mquantizer

............... torch install path ...............['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 
 .............. [93m[NO][0m utils.......utils   [92m[OKAY][0m....................................
torch version ....................['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'] 
1.8.1
  [92m[YES][0m[92m[YES][0m  ......--------------------------------------------------...... 
 [92m[OKAY][0m[92m[OKAY][0m

torch versiontorch cuda version  ...................................  1.8.111.1

nvcc versiontorch cuda version  ....................................  11.211.1
quantizer quantizer..............  ..............[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
deepspeed install path nvcc version...........  ..................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']11.2

deepspeed infodeepspeed install path  ..............................  0.4.2+72ce55a, 72ce55a, big-science['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed wheel compiled w.deepspeed info  .........................  torch 1.8, cuda 11.10.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference .. utils[93m[NO][0m  .........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
utils ..................quantizer  [92m[YES][0m..............  ......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
quantizer .............. [93m[NO][0m-------------------------------------------------- 
....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
------------------------------------------------------------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


------------------------------------------------------------------------------------------------------------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
------------------------------------------------------------------------------------------------------------------------------------------------------


--------------------------------------------------
JIT compiled ops requires ninjaJIT compiled ops requires ninja


JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

ninjaninjaninjaninja  ..................  .................. ....................................[92m[OKAY][0m   
[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m
--------------------------------------------------


--------------------------------------------------
----------------------------------------------------------------------------------------------------op nameop name
 
 ................op name................ op name installed   ................installed..................    installedinstalled ..compatible .. 
 ..compatible--------------------------------------------------compatible 


compatible----------------------------------------------------------------------------------------------------


--------------------------------------------------cpu_adam
 ............... [92m[YES][0m cpu_adam......  cpu_adam...............[92m[OKAY][0mcpu_adam  
 [92m[YES][0m..............................   ......[92m[YES][0m [92m[YES][0m [92m[OKAY][0mfused_adam ......
 ...... .............[92m[OKAY][0m  
[92m[OKAY][0m[93m[NO][0m
 fused_adam.......  .............[92m[OKAY][0m 
[93m[NO][0mfused_adam  .......fused_lamb.............   [92m[OKAY][0m.............fused_adam[93m[NO][0m
   [93m[NO][0m....................fused_lamb  .......[92m[OKAY][0m 
  [92m[OKAY][0mfused_lamb[93m[NO][0m
 ............. ............. ....... [93m[NO][0m [93m[NO][0m [92m[OKAY][0m .......
.......  [92m[OKAY][0mfused_lamb
[92m[OKAY][0m sparse_attn
.............  ............[93m[NO][0m  [93m[NO][0m .............. sparse_attn [92m[OKAY][0m [92m[OKAY][0msparse_attn............

  ............[93m[NO][0mtransformer   [93m[NO][0m....... ............ ....... [92m[OKAY][0m [93m[NO][0m
[92m[OKAY][0m 
....... [92m[OKAY][0m
transformertransformersparse_attn   ............stochastic_transformer............  ............[93m[NO][0m.    [93m[NO][0m[93m[NO][0m[93m[NO][0m.......    .......[92m[OKAY][0m....... .......
 [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m
stochastic_transformer
 stochastic_transformer.  transformer.[93m[NO][0m   [93m[NO][0m...................   .......[92m[OKAY][0m[93m[NO][0m 
[92m[OKAY][0m 
....... [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------

DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report


JIT compiled ops requires ninja--------------------------------------------------DeepSpeed C++/CUDA extension op report

--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
JIT compiled ops requires ninja
JIT compiled ops requires ninja
JIT compiled ops requires ninja

ninjaninjaninjaninja   ......................................................   .................. [92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m

[92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
op nameop name
op name   op name................................................    installed ................installedinstalled..    compatible..installed
..   ..compatible-------------------------------------------------- compatible

compatible
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------

cpu_adam ............... [92m[YES][0m ......cpu_adam  [92m[OKAY][0mcpu_adam
cpu_adam...............   ..............................[92m[YES][0m   [92m[YES][0m[92m[YES][0mfused_adam  ...... .........................   [92m[OKAY][0m [92m[OKAY][0m[92m[OKAY][0m
[93m[NO][0m
 
....... [92m[OKAY][0m
fused_lambfused_adam fused_adam .............  ..........................[93m[NO][0mfused_adam   [93m[NO][0m  .......[93m[NO][0m....................    [92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m.......

  .......[92m[OKAY][0m
 fused_lamb[92m[OKAY][0m fused_lamb.............
  [93m[NO][0m.............sparse_attn fused_lamb.......  [93m[NO][0m   ................................[92m[OKAY][0m   
[93m[NO][0m[92m[OKAY][0m[93m[NO][0m 
.......  [92m[OKAY][0m.......
 [92m[OKAY][0mtransformer
 ............ sparse_attn[93m[NO][0m  ...................sparse_attn   [93m[NO][0m[92m[OKAY][0m............sparse_attn
   .......[93m[NO][0mstochastic_transformer............    [92m[OKAY][0m.[93m[NO][0m.......
   .......transformer[92m[OKAY][0m[93m[NO][0m   
[92m[OKAY][0m...................
  transformer[93m[NO][0m[92m[OKAY][0m  
transformer...................   ............[93m[NO][0m[92m[OKAY][0m 
 [93m[NO][0m ..............  [92m[OKAY][0m[92m[OKAY][0mstochastic_transformer
 
. stochastic_transformer[93m[NO][0mstochastic_transformer   .........   [93m[NO][0m[93m[NO][0m[92m[OKAY][0m  
..............  [92m[OKAY][0m[92m[OKAY][0m

------------------------------------------------------------------------------------------------------------------------------------------------------


--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report
DeepSpeed C++/CUDA extension op report

--------------------------------------------------
--------------------------------------------------
DeepSpeed C++/CUDA extension op report--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------


JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja--------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninja

JIT compiled ops requires ninja
ninjaninjaninjaninja    ...................................................... ..................  [92m[OKAY][0m[92m[OKAY][0m 
[92m[OKAY][0m
[92m[OKAY][0m--------------------------------------------------


----------------------------------------------------------------------------------------------------op name
--------------------------------------------------
 
op name................op name   ................op nameinstalled................    ..installed................  installed  ..installedcompatible 
.. ..-------------------------------------------------- 
compatible compatible

compatible--------------------------------------------------
--------------------------------------------------

--------------------------------------------------
cpu_adam ............... [92m[YES][0m ......cpu_adam cpu_adam [92m[OKAY][0mcpu_adam...............  
 [92m[YES][0m ....................................   [92m[OKAY][0m[92m[YES][0m[92m[YES][0mfused_adam  ......
 ...... .............  [92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m

 .......fused_adam  [92m[OKAY][0m.............
 [93m[NO][0m .......fused_lamb fused_adam [92m[OKAY][0m 
fused_adam..........................  fused_lamb [93m[NO][0m ............. [93m[NO][0m.............  ....... [93m[NO][0m[93m[NO][0m   .....................[92m[OKAY][0m   [92m[OKAY][0m

[92m[OKAY][0m[92m[OKAY][0m

fused_lamb .............fused_lamb  [93m[NO][0m.............  .......[93m[NO][0m  [92m[OKAY][0m
sparse_attn....... sparse_attn  ........................ [92m[OKAY][0m [93m[NO][0m
[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0msparse_attn

 ............transformer transformer sparse_attn............[93m[NO][0m    ............[93m[NO][0m............ .......  ....... [93m[NO][0m [93m[NO][0m[92m[OKAY][0m  [92m[OKAY][0m..............

  [92m[OKAY][0mtransformer[92m[OKAY][0m

stochastic_transformer  .............transformerstochastic_transformer    ............[93m[NO][0m[93m[NO][0m.    [93m[NO][0m[93m[NO][0m..............   [92m[OKAY][0m ..............
[92m[OKAY][0m  [92m[OKAY][0m
[92m[OKAY][0m
stochastic_transformer
 . stochastic_transformer[93m[NO][0m  ........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report

--------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja


--------------------------------------------------JIT compiled ops requires ninja--------------------------------------------------


JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report

DeepSpeed general environment info:
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
----------------------------------------------------------------------------------------------------

--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


--------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------

JIT compiled ops requires ninjaJIT compiled ops requires ninja


DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja   .................. ....................................  .................. [92m[OKAY][0m[92m[OKAY][0m 
[92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------

------------------------------------------------------------------------------------------------------------------------------------------------------op name
 

op nameop name................op name    installed................................ ................installed   .. installed installedcompatible..
   --------------------------------------------------
..compatible.. 
 compatiblecompatible--------------------------------------------------


----------------------------------------------------------------------------------------------------cpu_adam

 ............... [92m[YES][0m ......cpu_adam [92m[OKAY][0mcpu_adam 
cpu_adam ...............  ...............[92m[YES][0m...............   [92m[YES][0mfused_adam......[92m[YES][0m   ................... ......  [92m[OKAY][0m [93m[NO][0m[92m[OKAY][0m 
[92m[OKAY][0m
.......
 [92m[OKAY][0m
fused_lamb ............. [93m[NO][0m fused_adamfused_adam....... fused_adam   .............[92m[OKAY][0m............. .............
 [93m[NO][0m [93m[NO][0m [93m[NO][0m ....... ....... ....... [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m

sparse_attnfused_lamb  fused_lambfused_lamb............  ............. [93m[NO][0m ............. .............[93m[NO][0m  ....... [93m[NO][0m[93m[NO][0m .......   [92m[OKAY][0m..............[92m[OKAY][0m
 
 [92m[OKAY][0m[92m[OKAY][0m

transformer ............ [93m[NO][0m ....... [92m[OKAY][0m
sparse_attnsparse_attn sparse_attn............stochastic_transformer   [93m[NO][0m .........................   [93m[NO][0m ....... [93m[NO][0m.......[93m[NO][0m    [92m[OKAY][0m..............[92m[OKAY][0m
  
[92m[OKAY][0m[92m[OKAY][0m
transformer
 ............transformer transformer [93m[NO][0m ............ ...................   [93m[NO][0m[93m[NO][0m[92m[OKAY][0m  
..............  [92m[OKAY][0m[92m[OKAY][0m

stochastic_transformer .stochastic_transformer  stochastic_transformer[93m[NO][0m . . .......  [93m[NO][0m[93m[NO][0m[92m[OKAY][0m  
..............  [92m[OKAY][0m[92m[OKAY][0m

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report
----------------------------------------------------------------------------------------------------

DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
DeepSpeed C++/CUDA extension op report

--------------------------------------------------
----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.JIT compiled ops requires ninja--------------------------------------------------

----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja


JIT compiled ops requires ninja

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch version torch version....................  ....................1.8.1 
1.8.1
torch cuda versiontorch cuda version DeepSpeed general environment info: ..............................  
11.1
11.1nvcc version
 nvcc version..................... torch install path .....................11.2  
11.2...............deepspeed install path
  deepspeed install path...........  ...........['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info

 deepspeed info...................torch version  ................... 0.4.2+72ce55a, 72ce55a, big-science ....................
0.4.2+72ce55a, 72ce55a, big-science deepspeed wheel compiled w.
1.8.1 
deepspeed wheel compiled w.......  torch 1.8, cuda 11.1torch cuda version......
  ...............torch 1.8, cuda 11.1 
11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
ninjaninjaninjaninja    ........................................................................    [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


op nameop name op nameop name   ................................................ ................  installed installedinstalled   ..installed..  compatible....
   compatiblecompatible--------------------------------------------------compatible


----------------------------------------------------------------------------------------------------

--------------------------------------------------
cpu_adamcpu_adam cpu_adam ...............cpu_adam  ............... ...............[92m[YES][0m...............    [92m[YES][0m[92m[YES][0m[92m[YES][0m......   ......[92m[OKAY][0m...... 
  [92m[OKAY][0m......[92m[OKAY][0m
 
[92m[OKAY][0m
fused_adam ............. [93m[NO][0m fused_adam.......fused_adam  ............. fused_adam [92m[OKAY][0m .............
.............[93m[NO][0m  fused_lamb  .......[93m[NO][0m.............[93m[NO][0m   ....... [92m[OKAY][0m[93m[NO][0m .......[92m[OKAY][0m
 
 .......[92m[OKAY][0m fused_lamb
[92m[OKAY][0m 
.............fused_lamb fused_lamb  [93m[NO][0m..........................   .......[93m[NO][0m[93m[NO][0m   .......[92m[OKAY][0m....... sparse_attn
[92m[OKAY][0m  
............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
transformer sparse_attn............  ............[93m[NO][0msparse_attn   [93m[NO][0m...................  sparse_attn .......[93m[NO][0m[92m[OKAY][0m 
  [92m[OKAY][0m...................
  stochastic_transformer[92m[OKAY][0m 
transformer.[93m[NO][0m   transformer.......[93m[NO][0m............    [92m[OKAY][0m............[93m[NO][0m.......
   [93m[NO][0m[92m[OKAY][0m....... 
 .......transformer[92m[OKAY][0m  
[92m[OKAY][0m............
 stochastic_transformer .stochastic_transformer  [93m[NO][0m[93m[NO][0m.   .............. [93m[NO][0m [92m[OKAY][0m [92m[OKAY][0m
.......
 [92m[OKAY][0m
stochastic_transformer . [93m[NO][0m ....... [92m[OKAY][0m
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
ninjaninjaninjaninja    ......................................................  ..................[92m[OKAY][0m [92m[OKAY][0m [92m[OKAY][0m
[92m[OKAY][0m


------------------------------------------------------------------------------------------------------------------------------------------------------


--------------------------------------------------op nameop name
 op name ................op name ................   installedinstalled ................  ....................installed  installed compatible  ..compatible
 ..
--------------------------------------------------compatible compatible--------------------------------------------------


----------------------------------------------------------------------------------------------------

cpu_adamcpu_adam  ............... cpu_adam[92m[YES][0mcpu_adam...............   ...... ...............[92m[YES][0m ...............  [92m[OKAY][0m [92m[YES][0m
...... [92m[YES][0m ...... [92m[OKAY][0m ......
[92m[OKAY][0m 
[92m[OKAY][0m
fused_adam ............. [93m[NO][0m .......fused_adam fused_adam [92m[OKAY][0m fused_adam.............
.............   .............[93m[NO][0mfused_lamb  [93m[NO][0m............. .......[93m[NO][0m  .......   [92m[OKAY][0m[92m[OKAY][0m.......[93m[NO][0m
 
 [92m[OKAY][0m.......fused_lamb
 [92m[OKAY][0mfused_lamb fused_lamb
 ............. ............. ............. [93m[NO][0m [93m[NO][0m [93m[NO][0m ....... ....... ....... [92m[OKAY][0m [92m[OKAY][0msparse_attn
[92m[OKAY][0m
 
............ [93m[NO][0m ....... [92m[OKAY][0m
transformersparse_attnsparse_attn   ............sparse_attn............ ............  [93m[NO][0m [93m[NO][0m............ [93m[NO][0m .......  .......  [93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m....... 

 .......[92m[OKAY][0m transformer
[92m[OKAY][0mstochastic_transformer 
............ transformer . transformer[93m[NO][0m  [93m[NO][0m ............ ..........................    [93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m
 
 ..............  stochastic_transformer[92m[OKAY][0m[92m[OKAY][0m 

. [93m[NO][0mstochastic_transformerstochastic_transformer   ........ . [92m[OKAY][0m [93m[NO][0m
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------

--------------------------------------------------DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja


DeepSpeed C++/CUDA extension op report--------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------

--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report--------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninja
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.

--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja   .................. ....................................  .................. [92m[OKAY][0m[92m[OKAY][0m 
[92m[OKAY][0m
[92m[OKAY][0m
--------------------------------------------------
----------------------------------------------------------------------------------------------------

--------------------------------------------------
op nameop name
  op nameop name................................    ................installed installed................ installed ..   ..installedcompatible..  
 compatible--------------------------------------------------..compatible

 
--------------------------------------------------compatible--------------------------------------------------


--------------------------------------------------
cpu_adam ............... [92m[YES][0m cpu_adamcpu_adam...... cpu_adam   .............................................[92m[OKAY][0m   
[92m[YES][0m[92m[YES][0m[92m[YES][0m  ...... ...... [92m[OKAY][0m ......
[92m[OKAY][0m fused_adam[92m[OKAY][0m
 
............. [93m[NO][0m ....... [92m[OKAY][0m
fused_adam .............fused_lambfused_adam  fused_adam [93m[NO][0m.............  ............. .............[93m[NO][0m.......    [93m[NO][0m[93m[NO][0m.......[92m[OKAY][0m   .......
[92m[OKAY][0m....... 
 [92m[OKAY][0mfused_lamb
[92m[OKAY][0m 
............. fused_lamb[93m[NO][0mfused_lamb   .................................   [92m[OKAY][0m[93m[NO][0msparse_attn[93m[NO][0m 
 ....... ...................   [92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m
 
....... [92m[OKAY][0m
sparse_attn transformer............  ............[93m[NO][0m  [93m[NO][0msparse_attn.......sparse_attn    .......[92m[OKAY][0m........................ 
  [92m[OKAY][0m[93m[NO][0m[93m[NO][0mtransformer 
  ..........................   [92m[OKAY][0mstochastic_transformer
[92m[OKAY][0m[93m[NO][0m 
. transformer .......transformer [93m[NO][0m  ............[92m[OKAY][0m ............ 
.......[93m[NO][0m   [93m[NO][0m[92m[OKAY][0mstochastic_transformer....... 
  .......[92m[OKAY][0m. 
 [92m[OKAY][0m[93m[NO][0m
 stochastic_transformer .......stochastic_transformer  .[92m[OKAY][0m.
  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda version ...............torch cuda version  ...............11.1 
11.1nvcc version
 ..................... nvcc version11.2 
.....................deepspeed install path  11.2...........
 deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']........... 
deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
...................deepspeed info  0.4.2+72ce55a, 72ce55a, big-science...................
 deepspeed wheel compiled w.0.4.2+72ce55a, 72ce55a, big-science 
...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 
...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
using world size: 64, data-parallel-size: 4, tensor-model-parallel size: 4, pipeline-model-parallel size: 4 
using torch.float16 for parameters ...
------------------------ arguments ------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
  accumulate_allreduce_grads_in_fp32 .............. False
  adam_beta1 ...................................... 0.9
  adam_beta2 ...................................... 0.999
  adam_eps ........................................ 1e-08
  adlr_autoresume ................................. False
  adlr_autoresume_interval ........................ 1000
  apply_query_key_layer_scaling ................... True
  apply_residual_connection_post_layernorm ........ False
  attention_dropout ............................... 0.1
  attention_softmax_in_fp32 ....................... False
  bert_binary_head ................................ True
  bert_load ....................................... None
  bf16 ............................................ False
  bias_dropout_fusion ............................. True
  bias_gelu_fusion ................................ True
  biencoder_projection_dim ........................ 0
  biencoder_shared_query_context_model ............ False
  block_data_path ................................. None
  checkpoint_activations .......................... True
  checkpoint_in_cpu ............................... False
  checkpoint_num_layers ........................... 1
  clip_grad ....................................... 1.0
  codecarbon_dir .................................. None
  consumed_train_samples .......................... 0
  consumed_valid_samples .......................... 0
  contigious_checkpointing ........................ False
  cpu_optimizer ................................... False
  cpu_torch_adam .................................. False
  data_impl ....................................... mmap
  data_parallel_size .............................. 4
  data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document']
  dataloader_type ................................. single
  DDP_impl ........................................ local
  decoder_seq_length .............................. None
  deepscale ....................................... False
  deepscale_config ................................ None
  deepspeed ....................................... True
  deepspeed_activation_checkpointing .............. True
  deepspeed_config ................................ ./ds_config.1269478.json
  deepspeed_mpi ................................... False
  distribute_checkpointed_activations ............. False
  distributed_backend ............................. nccl
  embedding_path .................................. None
  encoder_seq_length .............................. 2048
  eod_mask_loss ................................... False
  eval_interval ................................... 1000
  eval_iters ...................................... 100
  evidence_data_path .............................. None
  exit_duration_in_mins ........................... 1190
  exit_interval ................................... None
  ffn_hidden_size ................................. 8192
  finetune ........................................ False
  fp16 ............................................ True
  fp16_lm_cross_entropy ........................... False
  fp32_residual_connection ........................ False
  gigaflos_no_embeds .............................. 0
  global_batch_size ............................... 512
  glu_activation .................................. None
  hidden_dropout .................................. 0.1
  hidden_size ..................................... 2048
  hysteresis ...................................... 2
  ict_head_size ................................... None
  ict_load ........................................ None
  img_dim ......................................... 224
  indexer_batch_size .............................. 128
  indexer_log_interval ............................ 1000
  init_method_std ................................. 0.02
  init_method_xavier_uniform ...................... False
  initial_loss_scale .............................. 4294967296
  kv_channels ..................................... 128
  layernorm_epsilon ............................... 1e-05
  lazy_mpu_init ................................... None
  load ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
  local_rank ...................................... 0
  log_batch_size_to_tensorboard ................... True
  log_interval .................................... 200
  log_learning_rate_to_tensorboard ................ True
  log_loss_scale_to_tensorboard ................... True
  log_num_zeros_in_grad ........................... False
  log_params_norm ................................. False
  log_timers_to_tensorboard ....................... True
  log_validation_ppl_to_tensorboard ............... True
  loss_on_targets_only ............................ False
  loss_scale ...................................... 12.0
  loss_scale_window ............................... 1000
  lr .............................................. 0.0002
  lr_decay_iters .................................. None
  lr_decay_samples ................................ 73242187
  lr_decay_style .................................. cosine
  lr_warmup_fraction .............................. None
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****  lr_warmup_iters ................................. 0
  lr_warmup_samples ............................... 183105

  make_vocab_size_divisible_by .................... 128
  mask_prob ....................................... 0.15
  masked_softmax_fusion ........................... True
  max_position_embeddings ......................... None
  memory_centric_tiled_linear ..................... False
  merge_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-merges.txt
  micro_batch_size ................................ 8**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
  min_loss_scale .................................. 1.0

  min_lr .......................................... 1e-05
  mmap_warmup ..................................... False
  no_load_optim ................................... None
  no_load_rng ..................................... None
  no_save_optim ................................... None
  no_save_rng ..................................... None
  no_train ........................................ None
  num_attention_heads ............................. 16
  num_channels .................................... 3
  num_classes ..................................... 1000
  num_layers ...................................... 24
  num_layers_per_virtual_pipeline_stage ........... None
  num_workers ..................................... 2
  onnx_safe ....................................... None
  openai_gelu ..................................... False
  optimizer ....................................... adam
  override_lr_scheduler ........................... False
  params_dtype .................................... torch.float16
  partition_activations ........................... False
  patch_dim ....................................... 16
  pipeline_model_parallel_size .................... 4
  position_embedding_type ......................... PositionEmbeddingType.rotary
  profile_backward ................................ False
  query_in_block_prob ............................. 0.1
  rampup_batch_size ............................... ['32', '32', '2_000_000']
  rank ............................................ 0
  remote_device ................................... none
  reset_attention_mask ............................ False
  reset_position_ids .............................. False
  retriever_report_topk_accuracies ................ []
  retriever_score_scaling ......................... False
  retriever_seq_length ............................ 256
  sample_rate ..................................... 1.0
  save ............................................ /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
  save_interval ................................... 1500
  scatter_gather_tensors_in_pipeline .............. True
  scattered_embeddings ............................ False
  seed ............................................ 1234
  seq_length ...................................... 2048
  sgd_momentum .................................... 0.9
  short_seq_prob .................................. 0.1
  split ........................................... 949,50,1
  split_transformers .............................. False
  synchronize_each_layer .......................... False
  tensor_model_parallel_size ...................... 4
  tensorboard_dir ................................. /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/tr4c-1B3-rotary-oscar-logs/tensorboard
  tensorboard_log_interval ........................ 1
  tensorboard_queue_size .......................... 5
  tile_factor ..................................... 1
  titles_data_path ................................ None
  tokenizer_name_or_path .......................... None
  tokenizer_type .................................. GPT2BPETokenizer
  train_iters ..................................... None
  train_samples ................................... 73242187
  use_checkpoint_lr_scheduler ..................... False
  use_contiguous_buffers_in_ddp ................... False
  use_cpu_initialization .......................... None
  use_one_sent_docs ............................... False
  use_pin_memory .................................. False
  virtual_pipeline_model_parallel_size ............ None
  vocab_extra_ids ................................. 0
  vocab_file ...................................... /gpfsscratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/data/gpt2-vocab.json
  weight_decay .................................... 0.1
  world_size ...................................... 64
  zero_allgather_bucket_size ...................... 0.0
  zero_contigious_gradients ....................... False
  zero_reduce_bucket_size ......................... 0.0
  zero_reduce_scatter ............................. False
  zero_stage ...................................... 1
-------------------- end of arguments ---------------------
will use batch size rampup starting from global batch size 32 to global batch size 512 with batch size increments 32 over 2000000 samples.
> building GPT2BPETokenizer tokenizer ...
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
----------------------------------------------------------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------DeepSpeed C++/CUDA extension op report

----------------------------------------------------------------------------------------------------


DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


--------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja--------------------------------------------------


JIT compiled ops requires ninja
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninjaninjaninjaninja   .................. .................................... .................. [92m[OKAY][0m  [92m[OKAY][0m[92m[OKAY][0m


[92m[OKAY][0m--------------------------------------------------
--------------------------------------------------
----------------------------------------------------------------------------------------------------


op nameop nameop name op name................   ................................  ................installedinstalled    installed..installed..  compatible  ....
compatible  --------------------------------------------------
compatible
compatible
--------------------------------------------------

----------------------------------------------------------------------------------------------------

cpu_adam ............... cpu_adam[92m[YES][0m  .....................cpu_adam cpu_adam  [92m[OKAY][0m [92m[YES][0m............... ...............
  ......[92m[YES][0m[92m[YES][0m   [92m[OKAY][0m......
......  [92m[OKAY][0m[92m[OKAY][0mfused_adam

 ............. [93m[NO][0m fused_adam.......  .............[92m[OKAY][0m 
[93m[NO][0m fused_adam.......fused_adam fused_lamb   [92m[OKAY][0m.......................................
   [93m[NO][0m[93m[NO][0m [93m[NO][0mfused_lamb  ....... ..............  .............  [92m[OKAY][0m[92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m

 
....... fused_lamb[92m[OKAY][0mfused_lamb 
 ..........................  [93m[NO][0m[93m[NO][0m sparse_attn.......   ...................[92m[OKAY][0m  
[93m[NO][0msparse_attn[92m[OKAY][0m  .......
............  [92m[OKAY][0m[93m[NO][0m
 ....... sparse_attn[92m[OKAY][0mtransformer
  ........................transformer  [93m[NO][0m sparse_attn............  ....... [93m[NO][0m ............[93m[NO][0m  [92m[OKAY][0m .......[93m[NO][0m
.......   [92m[OKAY][0m.......[92m[OKAY][0m
stochastic_transformer  
[92m[OKAY][0m.transformer
 stochastic_transformer [93m[NO][0m ............ .transformer .......  [93m[NO][0m  [93m[NO][0m............[92m[OKAY][0m 
....... .......  [93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m 

....... [92m[OKAY][0mstochastic_transformer
 . stochastic_transformer[93m[NO][0m  ........  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io async_io...............  [93m[NO][0m...............  .......[93m[NO][0m  [93m[NO][0m.......
 [93m[NO][0m
transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utils ..................utils  [92m[YES][0m..................  ......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io ...............  ...............[93m[NO][0m  [93m[NO][0m.......  .......[93m[NO][0m 
[93m[NO][0m
transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference ..async_io [93m[NO][0m  ......................  [93m[NO][0m[92m[OKAY][0m 
....... [93m[NO][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
transformer_inference ..quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utils-------------------------------------------------- 
.................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m async_io.......  [93m[NO][0m...............
 [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m .......transformer_inference  [92m[OKAY][0m..
 [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ......utils  [92m[OKAY][0m..................
 [92m[YES][0m quantizer......  ..............[92m[OKAY][0m 
[93m[NO][0m ....... [92m[OKAY][0m
quantizer .............. [93m[NO][0m-------------------------------------------------- 
....... [92m[OKAY][0m
--------------------------------------------------
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m .......async_io [93m[NO][0m 
............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference transformer_inference ....  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

utils .................. [92m[YES][0m utils......  ..................[92m[OKAY][0m 
[92m[YES][0m ......quantizer ..............  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
quantizer-------------------------------------------------- 
.............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference ..transformer_inference  [93m[NO][0m..  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
utilsutils  ....................................  [92m[YES][0m[92m[YES][0m  ............  [92m[OKAY][0m[92m[OKAY][0m

quantizerquantizer  ............................  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

--------------------------------------------------
--------------------------------------------------
----------------------------------------------------------------------------------------------------
--------------------------------------------------DeepSpeed C++/CUDA extension op report--------------------------------------------------

DeepSpeed C++/CUDA extension op report--------------------------------------------------


DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report--------------------------------------------------


------------------------------------------------------------------------------------------------------------------------------------------------------

NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
JIT compiled ops requires ninja
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.--------------------------------------------------
--------------------------------------------------


JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja


JIT compiled ops requires ninja
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0mtransformer_inference
 .. [93m[NO][0m utils.......  ..................[92m[OKAY][0m 
[92m[YES][0m ...... [92m[OKAY][0m
utils ..................quantizer  [92m[YES][0m..............  ......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------quantizer
 .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
ninjaninjaninjaninja    ......................................................  .................. [92m[OKAY][0m[92m[OKAY][0m [92m[OKAY][0m

[92m[OKAY][0m--------------------------------------------------

--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
op name
op name op name op name................................  ................  ................installed installed  installedinstalled ..  .. .. ..compatible  compatible
compatiblecompatible
--------------------------------------------------
--------------------------------------------------

--------------------------------------------------
--------------------------------------------------

cpu_adamcpu_adam  cpu_adam.............................. cpu_adam  [92m[YES][0m[92m[YES][0m  ............... ..................... ......   [92m[OKAY][0m[92m[YES][0m[92m[YES][0m[92m[OKAY][0m
  
............  [92m[OKAY][0m[92m[OKAY][0m

fused_adamfused_adam  ..........................  fused_adam[93m[NO][0m[93m[NO][0m fused_adam  ............. ...........................    [92m[OKAY][0m[93m[NO][0m[93m[NO][0m[92m[OKAY][0m

  ..............fused_lamb   fused_lamb[92m[OKAY][0m.............[92m[OKAY][0m
 
[93m[NO][0m fused_lamb ............. fused_lamb.......   ..........................[93m[NO][0m[92m[OKAY][0m 
  .......[93m[NO][0m[93m[NO][0m   [92m[OKAY][0m..............
  [92m[OKAY][0m[92m[OKAY][0m

sparse_attn ............ [93m[NO][0m ....... sparse_attn[92m[OKAY][0msparse_attn 
 ........................sparse_attntransformer   [93m[NO][0m............ ............  [93m[NO][0m.......  [93m[NO][0m [93m[NO][0m .......[92m[OKAY][0m ....... 
[92m[OKAY][0m.......  
[92m[OKAY][0mtransformer[92m[OKAY][0m
 
transformer............transformer   ............[93m[NO][0m............stochastic_transformer  [93m[NO][0m .......  [93m[NO][0m ........   .......[92m[OKAY][0m[92m[OKAY][0m[93m[NO][0m 

 [92m[OKAY][0m.......
 stochastic_transformerstochastic_transformer[92m[OKAY][0m 
 stochastic_transformer..   [93m[NO][0m[93m[NO][0m.   ..............[93m[NO][0m   [92m[OKAY][0m[92m[OKAY][0m.......

 [92m[OKAY][0m
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
async_io ............... [93m[NO][0m ....... transformer_inference[93m[NO][0m 
.. [93m[NO][0m ....... [92m[OKAY][0m
transformer_inference ..utils  [93m[NO][0m..................  .......[92m[YES][0m  [92m[OKAY][0m......
 [92m[OKAY][0m
utilsquantizer  ................................  [92m[YES][0m[93m[NO][0m  .............  [92m[OKAY][0m[92m[OKAY][0m

quantizer-------------------------------------------------- 
.............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_io ............... [93m[NO][0m .......async_io [93m[NO][0m 
............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0mtransformer_inference  .........  [92m[OKAY][0m[93m[NO][0m
 ....... [92m[OKAY][0m
utils .................. utils[92m[YES][0m  ........................  [92m[YES][0m[92m[OKAY][0m 
...... [92m[OKAY][0m
quantizer ..............quantizer  [93m[NO][0m..............  .......[93m[NO][0m  [92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed info deepspeed info...................  ...................0.4.2+72ce55a, 72ce55a, big-science 
0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
 > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688)
> initializing torch distributed ...
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path ...............  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info
 deepspeed info...................  ...................0.4.2+72ce55a, 72ce55a, big-science 
0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed wheel compiled w.
 deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch version torch install path....................  1.8.1...............
 torch cuda version ............... 11.1['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

nvcc version .....................torch version  11.2....................
 deepspeed install path1.8.1 
...........torch cuda version  ...............['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
11.1deepspeed info
 nvcc version...................  .....................0.4.2+72ce55a, 72ce55a, big-science
 11.2deepspeed wheel compiled w.
 deepspeed install path......  ...........torch 1.8, cuda 11.1 
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path torch install path...............  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda version ............... 11.1
nvcc version torch cuda version.....................  ...............11.2 
11.1
deepspeed install path nvcc version...........  ..................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']11.2

deepspeed infodeepspeed install path  ..............................  0.4.2+72ce55a, 72ce55a, big-science
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed wheel compiled w.
 ......deepspeed info  torch 1.8, cuda 11.1...................
 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch version .................... torch install path1.8.1 
...............torch cuda version  ............... 11.1
nvcc version ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch'].....................
 11.2
torch versiondeepspeed install path  ...............................  1.8.1
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']torch cuda version
 deepspeed info...............  ...................11.1 
0.4.2+72ce55a, 72ce55a, big-sciencenvcc version
 deepspeed wheel compiled w......................  ......11.2 
torch 1.8, cuda 11.1deepspeed install path
 ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']torch version
 .................... torch version1.8.1
torch cuda version  ...................................  1.8.111.1

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
torch cuda versionnvcc version  ....................................  11.1
nvcc version ..................... 11.2
deepspeed install path11.2 ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

async_io ............... [93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m[NO][0m .......
 [93m[NO][0m
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
deepspeed install path deepspeed info...........  ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']0.4.2+72ce55a, 72ce55a, big-science

transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
async_io ............... [93m[NO][0m utils.......  ..................[93m[NO][0m 
deepspeed infodeepspeed wheel compiled w.  .........................  0.4.2+72ce55a, 72ce55a, big-sciencetorch 1.8, cuda 11.1

[92m[YES][0m ...... [92m[OKAY][0m
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
async_io quantizer...............  ..............transformer_inference[93m[NO][0m   [93m[NO][0m.........   .......[93m[NO][0m[93m[NO][0m  
[92m[OKAY][0m.......
 [92m[OKAY][0m
--------------------------------------------------
utilstransformer_inference  ....................  [92m[YES][0m[93m[NO][0m  .............  [92m[OKAY][0m[92m[OKAY][0m

quantizer .............. [93m[NO][0m utils.......  ..................[92m[OKAY][0m 
[92m[YES][0m ...... --------------------------------------------------[92m[OKAY][0m

quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install path torch install path...............  ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda version torch cuda version...............  ...............11.1 
11.1nvcc version
 nvcc version.....................  .....................11.2 
11.2deepspeed install path
 deepspeed install path...........  ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed info
 ...................deepspeed info  0.4.2+72ce55a, 72ce55a, big-science...................
 deepspeed wheel compiled w.0.4.2+72ce55a, 72ce55a, big-science 
......deepspeed wheel compiled w.  torch 1.8, cuda 11.1......
 torch 1.8, cuda 11.1
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [93m[NO][0m ....... [93m[NO][0m
transformer_inference .. [93m[NO][0m ....... [92m[OKAY][0m
utils .................. [92m[YES][0m ...... [92m[OKAY][0m
quantizer .............. [93m[NO][0m ....... [92m[OKAY][0m
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
> setting tensorboard ...
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
DeepSpeed general environment info:
torch install path ............... DeepSpeed general environment info:torch install path ...............
 ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch install pathtorch version['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']  ...............
....................  torch version1.8.1 
.................... torch cuda version1.8.1 ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']...............

 11.1torch cuda version
 torch versionnvcc version...............   .........................................11.1  
1.8.111.2nvcc version

 torch cuda versiondeepspeed install path.....................   ..........................11.2  
11.1['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']deepspeed install path 

...........deepspeed info nvcc version  ...................['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']..................... 
 0.4.2+72ce55a, 72ce55a, big-sciencedeepspeed info11.2
 
deepspeed wheel compiled w....................deepspeed install path   ......0.4.2+72ce55a, 72ce55a, big-science........... 
 torch 1.8, cuda 11.1deepspeed wheel compiled w.
['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed'] 
...... deepspeed infotorch 1.8, cuda 11.1 
................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

DeepSpeed general environment info:
torch install path ............... ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']
torch version .................... 1.8.1
torch cuda version ............... 11.1
nvcc version ..................... 11.2
deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']
deepspeed info ................... 0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
**** Git info for Megatron: git_hash=unknown git_branch=unknown ****
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
----------------------------------------------------------------------------------------------------
--------------------------------------------------JIT compiled ops requires ninja--------------------------------------------------


DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report


DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------


--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.


NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------


--------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja


JIT compiled ops requires ninja
ninjaninjaninjaninja  ....................................    ....................................[92m[OKAY][0m[92m[OKAY][0m 

 [92m[OKAY][0m--------------------------------------------------

[92m[OKAY][0m----------------------------------------------------------------------------------------------------

op name
op name--------------------------------------------------  ................op name................ 
 installed op name ................installed..    ..................compatible 
installed -------------------------------------------------- compatibleinstalled..

  --------------------------------------------------compatible..

 compatible--------------------------------------------------

--------------------------------------------------cpu_adam
 ...............cpu_adam  [92m[YES][0m...............cpu_adam  ...... [92m[YES][0m  [92m[OKAY][0mcpu_adam.....................
   [92m[YES][0m[92m[OKAY][0m............... 
......  [92m[YES][0m[92m[OKAY][0m 
......fused_adam  [92m[OKAY][0m.............
fused_adam  [93m[NO][0m............. fused_adam .......[93m[NO][0m   [92m[OKAY][0m.................... fused_adam
 [92m[OKAY][0m [93m[NO][0m
fused_lamb.............   ....................[93m[NO][0m   fused_lamb[92m[OKAY][0m[93m[NO][0m ....... .............
.......   [93m[NO][0m[92m[OKAY][0m[92m[OKAY][0mfused_lamb 

 .......fused_lamb .............[92m[OKAY][0m  
[93m[NO][0m.............  .......[93m[NO][0m  sparse_attn[92m[OKAY][0m....... 
 ............ [92m[OKAY][0msparse_attn[93m[NO][0m
  ...................  [93m[NO][0m[92m[OKAY][0m 
.......sparse_attn transformer [92m[OKAY][0m 
........................sparse_attn   transformer[93m[NO][0m[93m[NO][0m............   ............ .......[93m[NO][0m.......    [93m[NO][0m[92m[OKAY][0m[92m[OKAY][0m....... 
 .......
 [92m[OKAY][0mstochastic_transformer[92m[OKAY][0m
 
transformer.transformer   stochastic_transformer[93m[NO][0m............ ............  . .......[93m[NO][0m [93m[NO][0m  [93m[NO][0m[92m[OKAY][0m  .......
..............   [92m[OKAY][0m[92m[OKAY][0m
[92m[OKAY][0m

stochastic_transformerstochastic_transformer  ..  [93m[NO][0m[93m[NO][0m  ..............  [92m[OKAY][0m[92m[OKAY][0m

[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utils .................. [92m[YES][0mutils  ........................  [92m[OKAY][0m[92m[YES][0m
 ...... [92m[OKAY][0mquantizer
 .............. [93m[NO][0mquantizer  .....................  [92m[OKAY][0m[93m[NO][0m
 ....... --------------------------------------------------[92m[OKAY][0m

--------------------------------------------------
[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.[93m [WARNING] [0m async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.

async_ioasync_io  ..............................  [93m[NO][0m[93m[NO][0m  ..............  [93m[NO][0m[93m[NO][0m

transformer_inference transformer_inference..  ..[93m[NO][0m  [93m[NO][0m.......  .......[92m[OKAY][0m 
[92m[OKAY][0m
utils utils..................  ..................[92m[YES][0m  [92m[YES][0m......  ......[92m[OKAY][0m 
[92m[OKAY][0m
quantizer .............. quantizer[93m[NO][0m  .....................  [93m[NO][0m[92m[OKAY][0m 
....... [92m[OKAY][0m
--------------------------------------------------
--------------------------------------------------
DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.1
11.1
nvcc version nvcc version.....................  .....................11.2 
11.2
deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed infodeepspeed info  ......................................  0.4.2+72ce55a, 72ce55a, big-science0.4.2+72ce55a, 72ce55a, big-science

deepspeed wheel compiled w.deepspeed wheel compiled w.  ............  torch 1.8, cuda 11.1torch 1.8, cuda 11.1

DeepSpeed general environment info:DeepSpeed general environment info:

torch install pathtorch install path  ..............................  ['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch']

torch versiontorch version  ........................................  1.8.11.8.1

torch cuda versiontorch cuda version  ..............................  11.111.1

nvcc versionnvcc version  ..........................................  11.211.2

deepspeed install pathdeepspeed install path  ......................  ['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/deepspeed-big-science/deepspeed']

deepspeed info deepspeed info...................  ...................0.4.2+72ce55a, 72ce55a, big-science 
0.4.2+72ce55a, 72ce55a, big-science
deepspeed wheel compiled w. deepspeed wheel compiled w.......  ......torch 1.8, cuda 11.1 
torch 1.8, cuda 11.1
/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

/bin/sh: line 0: type: git: not found
/bin/sh: line 0: type: git: not found
**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown ****

> initializing tensor model parallel with size 4
> initializing pipeline model parallel with size 4
> setting random seeds to 1234 ...
[2021-10-06 10:00:50,533] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
> compiling dataset index builder ...
make: Entering directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data'
make: Nothing to be done for 'default'.
make: Leaving directory '/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/data'
>>> done with dataset index builder. Compilation time: 0.288 seconds
> compiling and loading fused kernels ...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_upper_triang_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_upper_triang_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_masked_softmax_cuda...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
Detected CUDA files, patching ldflags
Emitting ninja build file /gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module fused_mix_prec_layer_norm_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_mix_prec_layer_norm_cuda...
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: 

                               !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                              !! WARNING !!

  warnings.warn(WRONG_COMPILER_WARNING.format(
>>> done with compiling and loading fused kernels. Compilation time: 12.671 seconds
time to initialize megatron (seconds): 63.495
[after megatron is initialized] datetime: 2021-10-06 10:01:03 
building GPT model ...
[2021-10-06 10:01:03,595] [INFO] [utils.py:680:see_memory_usage] Before Building Model
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:373: FutureWarning: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved
  warnings.warn(
/gpfswork/rech/six/commun/conda/hf-prod/lib/python3.8/site-packages/torch/cuda/memory.py:381: FutureWarning: torch.cuda.max_memory_cached has been renamed to torch.cuda.max_memory_reserved
  warnings.warn(
[2021-10-06 10:01:03,597] [INFO] [utils.py:681:see_memory_usage] MA 0.0 GB         Max_MA 0.0 GB         CA 0.0 GB         Max_CA 0 GB 
[2021-10-06 10:01:03,598] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory:  used = 36.8 GB, percent = 19.7%
SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None
Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=1, data=0, model=0): 16, ProcessCoord(pipe=1, data=0, model=1): 17, ProcessCoord(pipe=1, data=0, model=2): 18, ProcessCoord(pipe=1, data=0, model=3): 19, ProcessCoord(pipe=1, data=1, model=0): 20, ProcessCoord(pipe=1, data=1, model=1): 21, ProcessCoord(pipe=1, data=1, model=2): 22, ProcessCoord(pipe=1, data=1, model=3): 23, ProcessCoord(pipe=1, data=2, model=0): 24, ProcessCoord(pipe=1, data=2, model=1): 25, ProcessCoord(pipe=1, data=2, model=2): 26, ProcessCoord(pipe=1, data=2, model=3): 27, ProcessCoord(pipe=1, data=3, model=0): 28, ProcessCoord(pipe=1, data=3, model=1): 29, ProcessCoord(pipe=1, data=3, model=2): 30, ProcessCoord(pipe=1, data=3, model=3): 31, ProcessCoord(pipe=2, data=0, model=0): 32, ProcessCoord(pipe=2, data=0, model=1): 33, ProcessCoord(pipe=2, data=0, model=2): 34, ProcessCoord(pipe=2, data=0, model=3): 35, ProcessCoord(pipe=2, data=1, model=0): 36, ProcessCoord(pipe=2, data=1, model=1): 37, ProcessCoord(pipe=2, data=1, model=2): 38, ProcessCoord(pipe=2, data=1, model=3): 39, ProcessCoord(pipe=2, data=2, model=0): 40, ProcessCoord(pipe=2, data=2, model=1): 41, ProcessCoord(pipe=2, data=2, model=2): 42, ProcessCoord(pipe=2, data=2, model=3): 43, ProcessCoord(pipe=2, data=3, model=0): 44, ProcessCoord(pipe=2, data=3, model=1): 45, ProcessCoord(pipe=2, data=3, model=2): 46, ProcessCoord(pipe=2, data=3, model=3): 47, ProcessCoord(pipe=3, data=0, model=0): 48, ProcessCoord(pipe=3, data=0, model=1): 49, ProcessCoord(pipe=3, data=0, model=2): 50, ProcessCoord(pipe=3, data=0, model=3): 51, ProcessCoord(pipe=3, data=1, model=0): 52, ProcessCoord(pipe=3, data=1, model=1): 53, ProcessCoord(pipe=3, data=1, model=2): 54, ProcessCoord(pipe=3, data=1, model=3): 55, ProcessCoord(pipe=3, data=2, model=0): 56, ProcessCoord(pipe=3, data=2, model=1): 57, ProcessCoord(pipe=3, data=2, model=2): 58, ProcessCoord(pipe=3, data=2, model=3): 59, ProcessCoord(pipe=3, data=3, model=0): 60, ProcessCoord(pipe=3, data=3, model=1): 61, ProcessCoord(pipe=3, data=3, model=2): 62, ProcessCoord(pipe=3, data=3, model=3): 63}
[2021-10-06 10:01:04,120] [INFO] [module.py:360:_partition_layers] Partitioning pipeline stages with method type:transformer
stage=0 layers=9
     0: _to_float16
     1: EmbeddingPipe
     2: <lambda>
     3: ParallelTransformerLayerPipe
     4: ParallelTransformerLayerPipe
     5: ParallelTransformerLayerPipe
     6: ParallelTransformerLayerPipe
     7: ParallelTransformerLayerPipe
     8: ParallelTransformerLayerPipe
stage=1 layers=6
     9: ParallelTransformerLayerPipe
    10: ParallelTransformerLayerPipe
    11: ParallelTransformerLayerPipe
    12: ParallelTransformerLayerPipe
    13: ParallelTransformerLayerPipe
    14: ParallelTransformerLayerPipe
stage=2 layers=6
    15: ParallelTransformerLayerPipe
    16: ParallelTransformerLayerPipe
    17: ParallelTransformerLayerPipe
    18: ParallelTransformerLayerPipe
    19: ParallelTransformerLayerPipe
    20: ParallelTransformerLayerPipe
stage=3 layers=10
    21: ParallelTransformerLayerPipe
    22: ParallelTransformerLayerPipe
    23: ParallelTransformerLayerPipe
    24: ParallelTransformerLayerPipe
    25: ParallelTransformerLayerPipe
    26: ParallelTransformerLayerPipe
    27: <lambda>
    28: MixedFusedLayerNorm
    29: EmbeddingPipe
    30: float16_to_fp32
  loss: CrossEntropy
 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 75592704
 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 101549056
 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 101549056
 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 101549056
 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 101544960
 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 101544960
 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 101549056
 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 101544960
[2021-10-06 10:01:04,443] [INFO] [utils.py:680:see_memory_usage] After Building Model
[2021-10-06 10:01:04,444] [INFO] [utils.py:681:see_memory_usage] MA 0.2 GB         Max_MA 0.22 GB         CA 0.24 GB         Max_CA 0 GB 
[2021-10-06 10:01:04,444] [INFO] [utils.py:689:see_memory_usage] CPU Virtual Memory:  used = 37.19 GB, percent = 19.9%
 > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 101544960
setting training iterations to 152972
> learning rate decay style: cosine
DeepSpeed is enabled.
[2021-10-06 10:01:04,466] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.4.2+72ce55a, git-hash=72ce55a, git-branch=big-science
[2021-10-06 10:01:04,531] [INFO] [engine.py:179:__init__] DeepSpeed Flops Profiler Enabled: False
[2021-10-06 10:01:04,532] [INFO] [engine.py:736:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer
[2021-10-06 10:01:04,532] [INFO] [engine.py:741:_configure_optimizer] Using client Optimizer as basic optimizer
[2021-10-06 10:01:04,532] [INFO] [engine.py:750:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam
[2021-10-06 10:01:04,532] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type=<class 'apex.optimizers.fused_adam.FusedAdam'>
[2021-10-06 10:01:04,532] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer
[2021-10-06 10:01:04,532] [INFO] [stage2.py:106:__init__] Reduce bucket size 500000000
[2021-10-06 10:01:04,532] [INFO] [stage2.py:107:__init__] Allgather bucket size 500000000
[2021-10-06 10:01:04,532] [INFO] [stage2.py:108:__init__] CPU Offload: False
[2021-10-06 10:01:04,532] [INFO] [stage2.py:109:__init__] Round robin gradient partitioning: False
[2021-10-06 10:01:04,770] [INFO] [stage2.py:419:__init__] optimizer state initialized
[2021-10-06 10:01:04,771] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam
[2021-10-06 10:01:04,771] [INFO] [engine.py:553:_configure_lr_scheduler] DeepSpeed using client LR scheduler
[2021-10-06 10:01:04,771] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = <megatron.learning_rates.AnnealingLR object at 0x14b8ca84c970>
[2021-10-06 10:01:04,771] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)]
[2021-10-06 10:01:04,771] [INFO] [config.py:900:print] DeepSpeedEngine configuration:
[2021-10-06 10:01:04,771] [INFO] [config.py:904:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2021-10-06 10:01:04,771] [INFO] [config.py:904:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2021-10-06 10:01:04,771] [INFO] [config.py:904:print]   allreduce_always_fp32 ........ False
[2021-10-06 10:01:04,771] [INFO] [config.py:904:print]   amp_enabled .................. False
[2021-10-06 10:01:04,771] [INFO] [config.py:904:print]   amp_params ................... False
[2021-10-06 10:01:04,771] [INFO] [config.py:904:print]   checkpoint_tag_validation_enabled  True
[2021-10-06 10:01:04,771] [INFO] [config.py:904:print]   checkpoint_tag_validation_fail  False
[2021-10-06 10:01:04,772] [INFO] [config.py:904:print]   disable_allgather ............ False
[2021-10-06 10:01:04,772] [INFO] [config.py:904:print]   dump_state ................... False
[2021-10-06 10:01:04,772] [INFO] [config.py:904:print]   dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1}
[2021-10-06 10:01:04,772] [INFO] [config.py:904:print]   eigenvalue_enabled ........... False
[2021-10-06 10:01:04,772] [INFO] [config.py:904:print]   eigenvalue_gas_boundary_resolution  1
[2021-10-06 10:01:04,772] [INFO] [config.py:904:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2021-10-06 10:01:04,772] [INFO] [config.py:904:print]   eigenvalue_layer_num ......... 0
[2021-10-06 10:01:04,772] [INFO] [config.py:904:print]   eigenvalue_max_iter .......... 100
[2021-10-06 10:01:04,772] [INFO] [config.py:904:print]   eigenvalue_stability ......... 1e-06
[2021-10-06 10:01:04,772] [INFO] [config.py:904:print]   eigenvalue_tol ............... 0.01
[2021-10-06 10:01:04,772] [INFO] [config.py:904:print]   eigenvalue_verbose ........... False
[2021-10-06 10:01:04,772] [INFO] [config.py:904:print]   elasticity_enabled ........... False
[2021-10-06 10:01:04,772] [INFO] [config.py:904:print]   flops_profiler_config ........ {
    "enabled": false, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2021-10-06 10:01:04,772] [INFO] [config.py:904:print]   fp16_enabled ................. True
[2021-10-06 10:01:04,772] [INFO] [config.py:904:print]   fp16_mixed_quantize .......... False
[2021-10-06 10:01:04,772] [INFO] [config.py:904:print]   global_rank .................. 0
[2021-10-06 10:01:04,772] [INFO] [config.py:904:print]   gradient_accumulation_steps .. 16
[2021-10-06 10:01:04,772] [INFO] [config.py:904:print]   gradient_clipping ............ 1.0
[2021-10-06 10:01:04,772] [INFO] [config.py:904:print]   gradient_predivide_factor .... 1.0
[2021-10-06 10:01:04,772] [INFO] [config.py:904:print]   initial_dynamic_scale ........ 4096
[2021-10-06 10:01:04,772] [INFO] [config.py:904:print]   loss_scale ................... 0
[2021-10-06 10:01:04,772] [INFO] [config.py:904:print]   memory_breakdown ............. False
[2021-10-06 10:01:04,772] [INFO] [config.py:904:print]   optimizer_legacy_fusion ...... False
[2021-10-06 10:01:04,772] [INFO] [config.py:904:print]   optimizer_name ............... None
[2021-10-06 10:01:04,772] [INFO] [config.py:904:print]   optimizer_params ............. None
[2021-10-06 10:01:04,772] [INFO] [config.py:904:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2021-10-06 10:01:04,772] [INFO] [config.py:904:print]   pld_enabled .................. False
[2021-10-06 10:01:04,772] [INFO] [config.py:904:print]   pld_params ................... False
[2021-10-06 10:01:04,772] [INFO] [config.py:904:print]   prescale_gradients ........... False
[2021-10-06 10:01:04,772] [INFO] [config.py:904:print]   quantize_change_rate ......... 0.001
[2021-10-06 10:01:04,772] [INFO] [config.py:904:print]   quantize_groups .............. 1
[2021-10-06 10:01:04,772] [INFO] [config.py:904:print]   quantize_offset .............. 1000
[2021-10-06 10:01:04,772] [INFO] [config.py:904:print]   quantize_period .............. 1000
[2021-10-06 10:01:04,772] [INFO] [config.py:904:print]   quantize_rounding ............ 0
[2021-10-06 10:01:04,772] [INFO] [config.py:904:print]   quantize_start_bits .......... 16
[2021-10-06 10:01:04,772] [INFO] [config.py:904:print]   quantize_target_bits ......... 8
[2021-10-06 10:01:04,773] [INFO] [config.py:904:print]   quantize_training_enabled .... False
[2021-10-06 10:01:04,773] [INFO] [config.py:904:print]   quantize_type ................ 0
[2021-10-06 10:01:04,773] [INFO] [config.py:904:print]   quantize_verbose ............. False
[2021-10-06 10:01:04,773] [INFO] [config.py:904:print]   scheduler_name ............... None
[2021-10-06 10:01:04,773] [INFO] [config.py:904:print]   scheduler_params ............. None
[2021-10-06 10:01:04,773] [INFO] [config.py:904:print]   sparse_attention ............. None
[2021-10-06 10:01:04,773] [INFO] [config.py:904:print]   sparse_gradients_enabled ..... False
[2021-10-06 10:01:04,773] [INFO] [config.py:904:print]   steps_per_print .............. 2000
[2021-10-06 10:01:04,773] [INFO] [config.py:904:print]   tensorboard_enabled .......... False
[2021-10-06 10:01:04,773] [INFO] [config.py:904:print]   tensorboard_job_name ......... DeepSpeedJobName
[2021-10-06 10:01:04,773] [INFO] [config.py:904:print]   tensorboard_output_path ...... 
[2021-10-06 10:01:04,773] [INFO] [config.py:904:print]   train_batch_size ............. 512
[2021-10-06 10:01:04,773] [INFO] [config.py:904:print]   train_micro_batch_size_per_gpu  8
[2021-10-06 10:01:04,773] [INFO] [config.py:904:print]   use_quantizer_kernel ......... False
[2021-10-06 10:01:04,773] [INFO] [config.py:904:print]   wall_clock_breakdown ......... False
[2021-10-06 10:01:04,773] [INFO] [config.py:904:print]   world_size ................... 4
[2021-10-06 10:01:04,773] [INFO] [config.py:904:print]   zero_allow_untested_optimizer  False
[2021-10-06 10:01:04,773] [INFO] [config.py:904:print]   zero_config .................. {
    "stage": 1, 
    "contiguous_gradients": false, 
    "reduce_scatter": true, 
    "reduce_bucket_size": 5.000000e+08, 
    "allgather_partitions": true, 
    "allgather_bucket_size": 5.000000e+08, 
    "overlap_comm": false, 
    "load_from_fp32_weights": true, 
    "elastic_checkpoint": true, 
    "offload_param": null, 
    "offload_optimizer": null, 
    "sub_group_size": 1.000000e+09, 
    "prefetch_bucket_size": 5.000000e+07, 
    "param_persistence_threshold": 1.000000e+05, 
    "max_live_parameters": 1.000000e+09, 
    "max_reuse_distance": 1.000000e+09, 
    "gather_fp16_weights_on_model_save": false, 
    "ignore_unused_parameters": true, 
    "round_robin_gradients": false, 
    "legacy_stage1": false
}
[2021-10-06 10:01:04,773] [INFO] [config.py:904:print]   zero_enabled ................. True
[2021-10-06 10:01:04,773] [INFO] [config.py:904:print]   zero_optimization_stage ...... 1
[2021-10-06 10:01:04,773] [INFO] [config.py:906:print]   json = {
    "train_micro_batch_size_per_gpu": 8, 
    "train_batch_size": 512, 
    "gradient_clipping": 1.0, 
    "zero_optimization": {
        "stage": 1
    }, 
    "fp16": {
        "enabled": true, 
        "loss_scale": 0, 
        "loss_scale_window": 500, 
        "hysteresis": 2, 
        "min_loss_scale": 1, 
        "initial_scale_power": 12
    }, 
    "steps_per_print": 2.000000e+03, 
    "wall_clock_breakdown": false
}
[2021-10-06 10:01:04,773] [INFO] [engine.py:76:__init__] CONFIG: micro_batches=16 micro_batch_size=8
[2021-10-06 10:01:05,064] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-06 10:01:05,064] [INFO] [engine.py:134:__init__] RANK=1 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-06 10:01:05,064] [INFO] [engine.py:134:__init__] RANK=2 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-06 10:01:05,064] [INFO] [engine.py:134:__init__] RANK=3 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=101544960 (101.545M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-06 10:01:05,064] [INFO] [engine.py:134:__init__] RANK=33 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-06 10:01:05,064] [INFO] [engine.py:134:__init__] RANK=32 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-06 10:01:05,064] [INFO] [engine.py:134:__init__] RANK=34 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-06 10:01:05,064] [INFO] [engine.py:134:__init__] RANK=35 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-06 10:01:05,064] [INFO] [engine.py:134:__init__] RANK=17 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-06 10:01:05,064] [INFO] [engine.py:134:__init__] RANK=16 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-06 10:01:05,064] [INFO] [engine.py:134:__init__] RANK=19 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-06 10:01:05,064] [INFO] [engine.py:134:__init__] RANK=18 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-06 10:01:05,064] [INFO] [engine.py:134:__init__] RANK=51 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-06 10:01:05,064] [INFO] [engine.py:134:__init__] RANK=48 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-06 10:01:05,064] [INFO] [engine.py:134:__init__] RANK=50 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
[2021-10-06 10:01:05,064] [INFO] [engine.py:134:__init__] RANK=49 STAGE=3 LAYERS=10 [21, 31) STAGE_PARAMS=101549056 (101.549M) TOTAL_PARAMS=1417117696 (1417.118M) UNIQUE_PARAMS=1313308672 (1313.309M)
 > using checkpoint value 0.0002 for learning rate
 > using checkpoint value 1e-05 for minimum learning rate
 > using checkpoint value 183105 for warmup iterations
 > using checkpoint value 73242187 for total number of iterations
 > using checkpoint value cosine for decay style
successfully loaded 4 ZeRO state_dicts for rank 16
successfully loaded 4 ZeRO state_dicts for rank 20
successfully loaded 4 ZeRO state_dicts for rank 28
successfully loaded 4 ZeRO state_dicts for rank 46
successfully loaded 4 ZeRO state_dicts for rank 42
successfully loaded 4 ZeRO state_dicts for rank 17
successfully loaded 4 ZeRO state_dicts for rank 21
successfully loaded 4 ZeRO state_dicts for rank 45
successfully loaded 4 ZeRO state_dicts for rank 19
successfully loaded 4 ZeRO state_dicts for rank 25
successfully loaded 4 ZeRO state_dicts for rank 41
successfully loaded 4 ZeRO state_dicts for rank 37
successfully loaded 4 ZeRO state_dicts for rank 23
successfully loaded 4 ZeRO state_dicts for rank 30
successfully loaded 4 ZeRO state_dicts for rank 27
successfully loaded 4 ZeRO state_dicts for rank 24
successfully loaded 4 ZeRO state_dicts for rank 38
successfully loaded 4 ZeRO state_dicts for rank 29
successfully loaded 4 ZeRO state_dicts for rank 22
successfully loaded 4 ZeRO state_dicts for rank 31
successfully loaded 4 ZeRO state_dicts for rank 36
successfully loaded 4 ZeRO state_dicts for rank 44
successfully loaded 4 ZeRO state_dicts for rank 47
successfully loaded 4 ZeRO state_dicts for rank 26
successfully loaded 4 ZeRO state_dicts for rank 43
successfully loaded 4 ZeRO state_dicts for rank 15
successfully loaded 4 ZeRO state_dicts for rank 40
successfully loaded 4 ZeRO state_dicts for rank 18
successfully loaded 4 ZeRO state_dicts for rank 39
successfully loaded 4 ZeRO state_dicts for rank 52
successfully loaded 4 ZeRO state_dicts for rank 11
successfully loaded 4 ZeRO state_dicts for rank 58
successfully loaded 4 ZeRO state_dicts for rank 54
successfully loaded 4 ZeRO state_dicts for rank 50
successfully loaded 4 ZeRO state_dicts for rank 48
successfully loaded 4 ZeRO state_dicts for rank 60
successfully loaded 4 ZeRO state_dicts for rank 56
loading 4 zero partition checkpoints for rank 16
successfully loaded 4 ZeRO state_dicts for rank 5
loading 4 zero partition checkpoints for rank 20
successfully loaded 4 ZeRO state_dicts for rank 1
loading 4 zero partition checkpoints for rank 28
successfully loaded 4 ZeRO state_dicts for rank 3
successfully loaded 4 ZeRO state_dicts for rank 7
loading 4 zero partition checkpoints for rank 21
loading 4 zero partition checkpoints for rank 46
successfully loaded 4 ZeRO state_dicts for rank 13
successfully loaded 4 ZeRO state_dicts for rank 9
successfully loaded 4 ZeRO state_dicts for rank 62
loading 4 zero partition checkpoints for rank 42
loading 4 zero partition checkpoints for rank 17
successfully loaded 4 ZeRO state_dicts for rank 8
successfully loaded 4 ZeRO state_dicts for rank 57
successfully loaded 4 ZeRO state_dicts for rank 61
successfully loaded 4 ZeRO state_dicts for rank 55
successfully loaded 4 ZeRO state_dicts for rank 53
successfully loaded 4 ZeRO state_dicts for rank 49
loading 4 zero partition checkpoints for rank 45
loading 4 zero partition checkpoints for rank 41
loading 4 zero partition checkpoints for rank 19
successfully loaded 4 ZeRO state_dicts for rank 63
successfully loaded 4 ZeRO state_dicts for rank 59
loading 4 zero partition checkpoints for rank 25
loading 4 zero partition checkpoints for rank 37
loading 4 zero partition checkpoints for rank 23
successfully loaded 4 ZeRO state_dicts for rank 4
loading 4 zero partition checkpoints for rank 30
successfully loaded 4 ZeRO state_dicts for rank 51
loading 4 zero partition checkpoints for rank 38
successfully loaded 4 ZeRO state_dicts for rank 12
loading 4 zero partition checkpoints for rank 24
successfully loaded 4 ZeRO state_dicts for rank 0
loading 4 zero partition checkpoints for rank 27
loading 4 zero partition checkpoints for rank 22
loading 4 zero partition checkpoints for rank 29
successfully loaded 4 ZeRO state_dicts for rank 14
successfully loaded 4 ZeRO state_dicts for rank 6
successfully loaded 4 ZeRO state_dicts for rank 10
successfully loaded 4 ZeRO state_dicts for rank 2
loading 4 zero partition checkpoints for rank 31
loading 4 zero partition checkpoints for rank 44
loading 4 zero partition checkpoints for rank 36
loading 4 zero partition checkpoints for rank 18
loading 4 zero partition checkpoints for rank 40
loading 4 zero partition checkpoints for rank 47
loading 4 zero partition checkpoints for rank 43
loading 4 zero partition checkpoints for rank 26
loading 4 zero partition checkpoints for rank 39
loading 4 zero partition checkpoints for rank 15
loading 4 zero partition checkpoints for rank 52
loading 4 zero partition checkpoints for rank 54
loading 4 zero partition checkpoints for rank 11
loading 4 zero partition checkpoints for rank 58
loading 4 zero partition checkpoints for rank 50
loading 4 zero partition checkpoints for rank 48
loading 4 zero partition checkpoints for rank 60
loading 4 zero partition checkpoints for rank 56
loading 4 zero partition checkpoints for rank 3
loading 4 zero partition checkpoints for rank 5
loading 4 zero partition checkpoints for rank 62
loading 4 zero partition checkpoints for rank 7
loading 4 zero partition checkpoints for rank 1
loading 4 zero partition checkpoints for rank 57
loading 4 zero partition checkpoints for rank 8
loading 4 zero partition checkpoints for rank 9
loading 4 zero partition checkpoints for rank 13
loading 4 zero partition checkpoints for rank 63
loading 4 zero partition checkpoints for rank 49
loading 4 zero partition checkpoints for rank 53
loading 4 zero partition checkpoints for rank 61
loading 4 zero partition checkpoints for rank 55
loading 4 zero partition checkpoints for rank 4
loading 4 zero partition checkpoints for rank 59
loading 4 zero partition checkpoints for rank 12
loading 4 zero partition checkpoints for rank 51
loading 4 zero partition checkpoints for rank 0
 checkpoint version 3.0
loading 4 zero partition checkpoints for rank 6
loading 4 zero partition checkpoints for rank 14
loading 4 zero partition checkpoints for rank 10
loading 4 zero partition checkpoints for rank 2
successfully loaded 4 ZeRO state_dicts for rank 32
successfully loaded 4 ZeRO state_dicts for rank 33
loading 4 zero partition checkpoints for rank 32
successfully loaded 4 ZeRO state_dicts for rank 34
successfully loaded 4 ZeRO state_dicts for rank 35
loading 4 zero partition checkpoints for rank 33
loading 4 zero partition checkpoints for rank 34
loading 4 zero partition checkpoints for rank 35
  successfully loaded checkpoint from /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints at iteration 122871
time (ms) | load-checkpoint: 3416.78
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.62471936
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264estimated model parameters: 1.209483264estimated model parameters: 1.209483264

/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")

/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
estimated model parameters: 1.62471936estimated model parameters: 1.62471936
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264estimated model parameters: 1.209483264

  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.209483264
/gpfsssd/scratch/rech/six/uue59kq/repos/Megatron-DeepSpeed/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings
  warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings")
estimated model parameters: 1.624784896estimated model parameters: 1.624784896

estimated model parameters: 1.209483264

estimated model parameters: 1.209483264
estimated model parameters: 1.62471936
estimated model parameters: 1.624784896
estimated model parameters: 1.62471936
estimated model parameters: 1.62471936
estimated model parameters: 1.62471936
estimated model parameters: 1.209483264estimated model parameters: 1.209483264

estimated model parameters: 1.209483264
estimated model parameters: 1.62471936
estimated model parameters: 1.209483264
estimated model parameters: 1.624784896
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.62471936estimated model parameters: 1.62471936

estimated model parameters: 1.62471936
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters: 1.62471936
estimated model parameters: 1.624784896estimated model parameters: 1.624784896

estimated model parameters: 1.624784896
estimated model parameters: 1.209483264
estimated model parameters: 1.624784896
estimated model parameters: 1.209483264
estimated model parameters: 1.624784896
estimated model parameters: 1.624784896
estimated model parameters: 1.209483264
estimated model parameters: 1.624784896estimated model parameters: 1.624784896

estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.624784896
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.62471936estimated model parameters: 1.62471936

estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.624784896estimated model parameters: 1.624784896
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.62471936
estimated model parameters: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264


estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.62471936
estimated model parameters without embeddings: 1.209483264
estimated model parameters: 1.624784896
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488

estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.2095488
estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264
estimated model parameters without embeddings: 1.209483264

estimated model parameters without embeddings: 1.2095488
[after model, optimizer, and learning rate scheduler are built] datetime: 2021-10-06 10:01:08 
> building train, validation, and test datasets ...
 > datasets target sizes (minimum size):
    train:      73242187
    validation: 7833600
    test:       51200
> building train, validation, and test datasets for GPT ...
 > building dataset index ...
    reading sizes...
    reading pointers...
    reading document index...
    creating numpy buffer of mmap...
    creating memory view of numpy buffer...
 > finished creating indexed dataset in 5.461086 seconds
    number of documents: 304230423
 > dataset split:
    train:
     document indices in [0, 288714672) total of 288714672 documents
    validation:
     document indices in [288714672, 303926193) total of 15211521 documents
    test:
     document indices in [303926193, 304230423) total of 304230 documents
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.244 seconds
    total number of samples: 131537224
    total number of epochs: 1
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.227 seconds
    total number of samples: 13854322
    total number of epochs: 2
 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_doc_idx.npy
 > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_sample_idx.npy
 > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_shuffle_idx.npy
    loaded indexed file in 0.069 seconds
    total number of samples: 137384
    total number of epochs: 1
> finished creating GPT datasets ...
[after dataloaders are built] datetime: 2021-10-06 10:01:19 
done with setup ...
training ...
Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion

Number of parameters: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion

Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion

Number of parameters: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion

Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.624784896 billion

Number of parameters: 1.62471936 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters: 1.62471936 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.624784896 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.62471936 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion

Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters: 1.624784896 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters without embeddings: 1.209483264 billion
Number of parameters: 1.624784896 billionNumber of parameters: 1.624784896 billion

time (ms) | model-and-optimizer-setup: 5085.37 | train/valid/test-data-iterators-setup: 10652.72
Number of parameters: 1.624784896 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.624784896 billion
Number of parameters: 1.624784896 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billionNumber of parameters: 1.624784896 billion
Number of parameters without embeddings: 1.2095488 billionNumber of parameters without embeddings: 1.2095488 billion


Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion
Number of parameters without embeddings: 1.2095488 billion
[before the start of training step] datetime: 2021-10-06 10:01:19 
[2021-10-06 10:01:20,147] [INFO] [checkpointing.py:408:forward] Activation Checkpointing Information
[2021-10-06 10:01:20,147] [INFO] [checkpointing.py:409:forward] ----Partition Activations False, CPU CHECKPOINTING False
[2021-10-06 10:01:20,147] [INFO] [checkpointing.py:412:forward] ----contiguous Memory Checkpointing False with 24 total layers
[2021-10-06 10:01:20,147] [INFO] [checkpointing.py:415:forward] ----Synchronization False
[2021-10-06 10:01:20,147] [INFO] [checkpointing.py:416:forward] ----Profiling time in checkpointing False
[Rank 2] (after 123000 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.95751953125 | reserved: 5478.0 | max reserved: 5478.0
[Rank 18] (after 123000 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4732.0 | max reserved: 4732.0
[Rank 34] (after 123000 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4284.0 | max reserved: 4284.0
[Rank 0] (after 123000 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.95751953125 | reserved: 5446.0 | max reserved: 5446.0
[Rank 16] (after 123000 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4748.0 | max reserved: 4748.0
[Rank 32] (after 123000 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4380.0 | max reserved: 4380.0
[Rank 48] (after 123000 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 6710.0 | max reserved: 6710.0
[Rank 50] (after 123000 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 6726.0 | max reserved: 6726.0
[Rank 17] (after 123000 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4620.0 | max reserved: 4620.0
[Rank 33] (after 123000 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4348.0 | max reserved: 4348.0
[Rank 49] (after 123000 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7326.0 | max reserved: 7326.0
[Rank 1] (after 123000 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.45751953125 | reserved: 5446.0 | max reserved: 5446.0
[Rank 3] (after 123000 iterations) memory (MB) | allocated: 531.96142578125 | max allocated: 4038.45751953125 | reserved: 5478.0 | max reserved: 5478.0
[Rank 35] (after 123000 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 2930.89501953125 | reserved: 4380.0 | max reserved: 4380.0
[Rank 19] (after 123000 iterations) memory (MB) | allocated: 402.46240234375 | max allocated: 3330.8955078125 | reserved: 4572.0 | max reserved: 4572.0
[Rank 51] (after 123000 iterations) memory (MB) | allocated: 2351.1142578125 | max allocated: 4321.55517578125 | reserved: 7326.0 | max reserved: 7326.0
 iteration   123000/  152972 | consumed samples:     57896384 | elapsed time per iteration (ms): 6031.8 | learning rate: 3.027E-05 | global batch size:   512 | lm loss: 2.759393E+00 | loss scale: 1048576.0 | grad norm: 89653.784 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
--------------------------------------------------------------------------------------------------
 validation loss at iteration 123000 | lm loss value: 2.709008E+00 | lm loss PPL: 1.501438E+01 | 
--------------------------------------------------------------------------------------------------
saving checkpoint at iteration  123000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-06 10:17:14,038] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step123000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration  123000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1542.81
 iteration   123200/  152972 | consumed samples:     57998784 | elapsed time per iteration (ms): 6821.1 | learning rate: 3.001E-05 | global batch size:   512 | lm loss: 2.760194E+00 | loss scale: 524288.0 | grad norm: 46507.355 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   123400/  152972 | consumed samples:     58101184 | elapsed time per iteration (ms): 5939.9 | learning rate: 2.976E-05 | global batch size:   512 | lm loss: 2.760050E+00 | loss scale: 524288.0 | grad norm: 49378.717 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   123600/  152972 | consumed samples:     58203584 | elapsed time per iteration (ms): 5938.9 | learning rate: 2.950E-05 | global batch size:   512 | lm loss: 2.762291E+00 | loss scale: 524288.0 | grad norm: 48514.007 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   123800/  152972 | consumed samples:     58305984 | elapsed time per iteration (ms): 5926.4 | learning rate: 2.925E-05 | global batch size:   512 | lm loss: 2.760939E+00 | loss scale: 131072.0 | grad norm: 12685.399 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-10-06 11:56:08,343] [INFO] [logging.py:68:log_dist] [Rank 0] step=124000, skipped=275, lr=[2.9001601166318924e-05, 2.9001601166318924e-05], mom=[(0.9, 0.999), (0.9, 0.999)]
 iteration   124000/  152972 | consumed samples:     58408384 | elapsed time per iteration (ms): 5925.2 | learning rate: 2.900E-05 | global batch size:   512 | lm loss: 2.758842E+00 | loss scale: 131072.0 | grad norm: 12178.029 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
steps: 124000 loss: 2.7366 iter time (s): 0.003 samples/sec: 173043.323
--------------------------------------------------------------------------------------------------
 validation loss at iteration 124000 | lm loss value: 2.708781E+00 | lm loss PPL: 1.501097E+01 | 
--------------------------------------------------------------------------------------------------
 iteration   124200/  152972 | consumed samples:     58510784 | elapsed time per iteration (ms): 6802.3 | learning rate: 2.875E-05 | global batch size:   512 | lm loss: 2.759311E+00 | loss scale: 131072.0 | grad norm: 12784.724 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   124400/  152972 | consumed samples:     58613184 | elapsed time per iteration (ms): 5936.4 | learning rate: 2.850E-05 | global batch size:   512 | lm loss: 2.763126E+00 | loss scale: 262144.0 | grad norm: 26823.332 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration  124500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-06 12:48:30,335] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step124500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration  124500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1540.87
 iteration   124600/  152972 | consumed samples:     58715584 | elapsed time per iteration (ms): 5944.2 | learning rate: 2.826E-05 | global batch size:   512 | lm loss: 2.762295E+00 | loss scale: 262144.0 | grad norm: 62806.905 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   124800/  152972 | consumed samples:     58817984 | elapsed time per iteration (ms): 5943.0 | learning rate: 2.801E-05 | global batch size:   512 | lm loss: 2.759349E+00 | loss scale: 524288.0 | grad norm: 51084.281 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   125000/  152972 | consumed samples:     58920384 | elapsed time per iteration (ms): 5944.2 | learning rate: 2.777E-05 | global batch size:   512 | lm loss: 2.760069E+00 | loss scale: 524288.0 | grad norm: 51158.431 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
--------------------------------------------------------------------------------------------------
 validation loss at iteration 125000 | lm loss value: 2.707572E+00 | lm loss PPL: 1.499282E+01 | 
--------------------------------------------------------------------------------------------------
 iteration   125200/  152972 | consumed samples:     59022784 | elapsed time per iteration (ms): 6813.9 | learning rate: 2.752E-05 | global batch size:   512 | lm loss: 2.760716E+00 | loss scale: 524288.0 | grad norm: 48770.013 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   125400/  152972 | consumed samples:     59125184 | elapsed time per iteration (ms): 5941.8 | learning rate: 2.728E-05 | global batch size:   512 | lm loss: 2.758474E+00 | loss scale: 1048576.0 | grad norm: 105539.074 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   125600/  152972 | consumed samples:     59227584 | elapsed time per iteration (ms): 5938.6 | learning rate: 2.704E-05 | global batch size:   512 | lm loss: 2.754835E+00 | loss scale: 524288.0 | grad norm: 52023.877 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   125800/  152972 | consumed samples:     59329984 | elapsed time per iteration (ms): 5942.0 | learning rate: 2.681E-05 | global batch size:   512 | lm loss: 2.761851E+00 | loss scale: 524288.0 | grad norm: 51035.758 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-10-06 15:20:00,975] [INFO] [logging.py:68:log_dist] [Rank 0] step=126000, skipped=277, lr=[2.656847686054869e-05, 2.656847686054869e-05], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 126000 loss: 2.7920 iter time (s): 0.003 samples/sec: 172280.388
 iteration   126000/  152972 | consumed samples:     59432384 | elapsed time per iteration (ms): 5956.8 | learning rate: 2.657E-05 | global batch size:   512 | lm loss: 2.759366E+00 | loss scale: 524288.0 | grad norm: 52868.194 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
--------------------------------------------------------------------------------------------------
 validation loss at iteration 126000 | lm loss value: 2.705758E+00 | lm loss PPL: 1.496565E+01 | 
--------------------------------------------------------------------------------------------------
saving checkpoint at iteration  126000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-06 15:22:56,450] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step126000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration  126000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1650.59
 iteration   126200/  152972 | consumed samples:     59534784 | elapsed time per iteration (ms): 6830.7 | learning rate: 2.633E-05 | global batch size:   512 | lm loss: 2.760488E+00 | loss scale: 1048576.0 | grad norm: 103929.632 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   126400/  152972 | consumed samples:     59637184 | elapsed time per iteration (ms): 5955.4 | learning rate: 2.610E-05 | global batch size:   512 | lm loss: 2.759263E+00 | loss scale: 1048576.0 | grad norm: 103457.752 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   126600/  152972 | consumed samples:     59739584 | elapsed time per iteration (ms): 5962.9 | learning rate: 2.587E-05 | global batch size:   512 | lm loss: 2.757700E+00 | loss scale: 1048576.0 | grad norm: 101368.964 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   126800/  152972 | consumed samples:     59841984 | elapsed time per iteration (ms): 5950.5 | learning rate: 2.564E-05 | global batch size:   512 | lm loss: 2.758351E+00 | loss scale: 1048576.0 | grad norm: 103060.229 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   127000/  152972 | consumed samples:     59944384 | elapsed time per iteration (ms): 5948.5 | learning rate: 2.541E-05 | global batch size:   512 | lm loss: 2.759777E+00 | loss scale: 1048576.0 | grad norm: 95773.621 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
--------------------------------------------------------------------------------------------------
 validation loss at iteration 127000 | lm loss value: 2.704773E+00 | lm loss PPL: 1.495092E+01 | 
--------------------------------------------------------------------------------------------------
 iteration   127200/  152972 | consumed samples:     60046784 | elapsed time per iteration (ms): 6838.1 | learning rate: 2.518E-05 | global batch size:   512 | lm loss: 2.758083E+00 | loss scale: 524288.0 | grad norm: 50966.788 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   127400/  152972 | consumed samples:     60149184 | elapsed time per iteration (ms): 5943.0 | learning rate: 2.496E-05 | global batch size:   512 | lm loss: 2.757971E+00 | loss scale: 262144.0 | grad norm: 25385.672 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration  127500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-06 17:54:41,502] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step127500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration  127500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1602.65
 iteration   127600/  152972 | consumed samples:     60251584 | elapsed time per iteration (ms): 5951.3 | learning rate: 2.473E-05 | global batch size:   512 | lm loss: 2.755850E+00 | loss scale: 262144.0 | grad norm: 26370.366 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   127800/  152972 | consumed samples:     60353984 | elapsed time per iteration (ms): 5946.7 | learning rate: 2.451E-05 | global batch size:   512 | lm loss: 2.756603E+00 | loss scale: 524288.0 | grad norm: 53169.830 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-10-06 18:44:13,466] [INFO] [logging.py:68:log_dist] [Rank 0] step=128000, skipped=282, lr=[2.429040302651653e-05, 2.429040302651653e-05], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 128000 loss: 2.7372 iter time (s): 0.003 samples/sec: 172268.475
 iteration   128000/  152972 | consumed samples:     60456384 | elapsed time per iteration (ms): 5935.4 | learning rate: 2.429E-05 | global batch size:   512 | lm loss: 2.756633E+00 | loss scale: 524288.0 | grad norm: 49358.351 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
--------------------------------------------------------------------------------------------------
 validation loss at iteration 128000 | lm loss value: 2.706122E+00 | lm loss PPL: 1.497111E+01 | 
--------------------------------------------------------------------------------------------------
 iteration   128200/  152972 | consumed samples:     60558784 | elapsed time per iteration (ms): 6813.2 | learning rate: 2.407E-05 | global batch size:   512 | lm loss: 2.760710E+00 | loss scale: 524288.0 | grad norm: 50175.635 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   128400/  152972 | consumed samples:     60661184 | elapsed time per iteration (ms): 5931.2 | learning rate: 2.385E-05 | global batch size:   512 | lm loss: 2.758769E+00 | loss scale: 524288.0 | grad norm: 50632.210 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   128600/  152972 | consumed samples:     60763584 | elapsed time per iteration (ms): 5938.6 | learning rate: 2.364E-05 | global batch size:   512 | lm loss: 2.756382E+00 | loss scale: 1048576.0 | grad norm: 103854.494 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   128800/  152972 | consumed samples:     60865984 | elapsed time per iteration (ms): 5932.0 | learning rate: 2.342E-05 | global batch size:   512 | lm loss: 2.758448E+00 | loss scale: 524288.0 | grad norm: 47823.830 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   129000/  152972 | consumed samples:     60968384 | elapsed time per iteration (ms): 5932.3 | learning rate: 2.321E-05 | global batch size:   512 | lm loss: 2.756409E+00 | loss scale: 524288.0 | grad norm: 50102.426 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
--------------------------------------------------------------------------------------------------
 validation loss at iteration 129000 | lm loss value: 2.701550E+00 | lm loss PPL: 1.490281E+01 | 
--------------------------------------------------------------------------------------------------
saving checkpoint at iteration  129000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-06 20:28:58,410] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step129000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration  129000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1530.38
 iteration   129200/  152972 | consumed samples:     61070784 | elapsed time per iteration (ms): 6814.4 | learning rate: 2.300E-05 | global batch size:   512 | lm loss: 2.754760E+00 | loss scale: 262144.0 | grad norm: 24701.745 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   129400/  152972 | consumed samples:     61173184 | elapsed time per iteration (ms): 5981.4 | learning rate: 2.279E-05 | global batch size:   512 | lm loss: 2.755341E+00 | loss scale: 262144.0 | grad norm: 27424.870 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   129600/  152972 | consumed samples:     61275584 | elapsed time per iteration (ms): 5932.1 | learning rate: 2.258E-05 | global batch size:   512 | lm loss: 2.758741E+00 | loss scale: 262144.0 | grad norm: 24444.424 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   129800/  152972 | consumed samples:     61377984 | elapsed time per iteration (ms): 5928.9 | learning rate: 2.237E-05 | global batch size:   512 | lm loss: 2.757538E+00 | loss scale: 524288.0 | grad norm: 50401.181 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-10-06 22:08:00,112] [INFO] [logging.py:68:log_dist] [Rank 0] step=130000, skipped=286, lr=[2.2166984919676447e-05, 2.2166984919676447e-05], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 130000 loss: 2.7897 iter time (s): 0.003 samples/sec: 172820.399
 iteration   130000/  152972 | consumed samples:     61480384 | elapsed time per iteration (ms): 5929.0 | learning rate: 2.217E-05 | global batch size:   512 | lm loss: 2.755861E+00 | loss scale: 524288.0 | grad norm: 50825.973 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
--------------------------------------------------------------------------------------------------
 validation loss at iteration 130000 | lm loss value: 2.701914E+00 | lm loss PPL: 1.490823E+01 | 
--------------------------------------------------------------------------------------------------
 iteration   130200/  152972 | consumed samples:     61582784 | elapsed time per iteration (ms): 6788.8 | learning rate: 2.196E-05 | global batch size:   512 | lm loss: 2.755097E+00 | loss scale: 1048576.0 | grad norm: 98464.345 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   130400/  152972 | consumed samples:     61685184 | elapsed time per iteration (ms): 5927.4 | learning rate: 2.176E-05 | global batch size:   512 | lm loss: 2.756150E+00 | loss scale: 1048576.0 | grad norm: 101956.086 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration  130500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-06 23:00:16,908] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step130500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration  130500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1747.36
 iteration   130600/  152972 | consumed samples:     61787584 | elapsed time per iteration (ms): 5940.3 | learning rate: 2.156E-05 | global batch size:   512 | lm loss: 2.756725E+00 | loss scale: 1048576.0 | grad norm: 100560.135 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   130800/  152972 | consumed samples:     61889984 | elapsed time per iteration (ms): 5937.6 | learning rate: 2.136E-05 | global batch size:   512 | lm loss: 2.758968E+00 | loss scale: 1048576.0 | grad norm: 104625.899 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   131000/  152972 | consumed samples:     61992384 | elapsed time per iteration (ms): 5931.4 | learning rate: 2.117E-05 | global batch size:   512 | lm loss: 2.752789E+00 | loss scale: 524288.0 | grad norm: 52719.021 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
--------------------------------------------------------------------------------------------------
 validation loss at iteration 131000 | lm loss value: 2.697248E+00 | lm loss PPL: 1.483883E+01 | 
--------------------------------------------------------------------------------------------------
 iteration   131200/  152972 | consumed samples:     62094784 | elapsed time per iteration (ms): 6797.1 | learning rate: 2.097E-05 | global batch size:   512 | lm loss: 2.753011E+00 | loss scale: 524288.0 | grad norm: 49199.447 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   131400/  152972 | consumed samples:     62197184 | elapsed time per iteration (ms): 5937.3 | learning rate: 2.078E-05 | global batch size:   512 | lm loss: 2.755176E+00 | loss scale: 524288.0 | grad norm: 51159.861 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   131600/  152972 | consumed samples:     62299584 | elapsed time per iteration (ms): 5933.5 | learning rate: 2.058E-05 | global batch size:   512 | lm loss: 2.754171E+00 | loss scale: 262144.0 | grad norm: 25637.703 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   131800/  152972 | consumed samples:     62401984 | elapsed time per iteration (ms): 5926.8 | learning rate: 2.039E-05 | global batch size:   512 | lm loss: 2.753537E+00 | loss scale: 262144.0 | grad norm: 25870.571 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-10-07 01:31:30,179] [INFO] [logging.py:68:log_dist] [Rank 0] step=132000, skipped=290, lr=[2.020350269051709e-05, 2.020350269051709e-05], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 132000 loss: 2.7851 iter time (s): 0.003 samples/sec: 173016.507
 iteration   132000/  152972 | consumed samples:     62504384 | elapsed time per iteration (ms): 5930.1 | learning rate: 2.020E-05 | global batch size:   512 | lm loss: 2.753011E+00 | loss scale: 524288.0 | grad norm: 48795.418 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
--------------------------------------------------------------------------------------------------
 validation loss at iteration 132000 | lm loss value: 2.701464E+00 | lm loss PPL: 1.490153E+01 | 
--------------------------------------------------------------------------------------------------
saving checkpoint at iteration  132000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-07 01:34:24,128] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step132000/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration  132000 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1545.69
 iteration   132200/  152972 | consumed samples:     62606784 | elapsed time per iteration (ms): 6803.1 | learning rate: 2.002E-05 | global batch size:   512 | lm loss: 2.753535E+00 | loss scale: 524288.0 | grad norm: 49517.665 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   132400/  152972 | consumed samples:     62709184 | elapsed time per iteration (ms): 5927.7 | learning rate: 1.983E-05 | global batch size:   512 | lm loss: 2.756954E+00 | loss scale: 524288.0 | grad norm: 52285.013 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   132600/  152972 | consumed samples:     62811584 | elapsed time per iteration (ms): 5922.9 | learning rate: 1.965E-05 | global batch size:   512 | lm loss: 2.753035E+00 | loss scale: 1048576.0 | grad norm: 99811.152 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   132800/  152972 | consumed samples:     62913984 | elapsed time per iteration (ms): 5929.4 | learning rate: 1.946E-05 | global batch size:   512 | lm loss: 2.753202E+00 | loss scale: 1048576.0 | grad norm: 105095.290 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   133000/  152972 | consumed samples:     63016384 | elapsed time per iteration (ms): 5948.6 | learning rate: 1.928E-05 | global batch size:   512 | lm loss: 2.753857E+00 | loss scale: 1048576.0 | grad norm: 102949.811 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
--------------------------------------------------------------------------------------------------
 validation loss at iteration 133000 | lm loss value: 2.698569E+00 | lm loss PPL: 1.485845E+01 | 
--------------------------------------------------------------------------------------------------
 iteration   133200/  152972 | consumed samples:     63118784 | elapsed time per iteration (ms): 6799.9 | learning rate: 1.910E-05 | global batch size:   512 | lm loss: 2.752545E+00 | loss scale: 1048576.0 | grad norm: 100065.306 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   133400/  152972 | consumed samples:     63221184 | elapsed time per iteration (ms): 5926.0 | learning rate: 1.893E-05 | global batch size:   512 | lm loss: 2.751432E+00 | loss scale: 524288.0 | grad norm: 49715.978 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration  133500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-07 04:05:34,648] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step133500/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration  133500 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1697.69
 iteration   133600/  152972 | consumed samples:     63323584 | elapsed time per iteration (ms): 5933.4 | learning rate: 1.875E-05 | global batch size:   512 | lm loss: 2.754133E+00 | loss scale: 262144.0 | grad norm: 25065.831 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   133800/  152972 | consumed samples:     63425984 | elapsed time per iteration (ms): 5927.8 | learning rate: 1.858E-05 | global batch size:   512 | lm loss: 2.751666E+00 | loss scale: 262144.0 | grad norm: 25053.000 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
[2021-10-07 04:55:00,030] [INFO] [logging.py:68:log_dist] [Rank 0] step=134000, skipped=293, lr=[1.8402887422878076e-05, 1.8402887422878076e-05], mom=[(0.9, 0.999), (0.9, 0.999)]
steps: 134000 loss: 2.7822 iter time (s): 0.003 samples/sec: 173083.763
 iteration   134000/  152972 | consumed samples:     63528384 | elapsed time per iteration (ms): 5930.6 | learning rate: 1.840E-05 | global batch size:   512 | lm loss: 2.752799E+00 | loss scale: 524288.0 | grad norm: 50094.959 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
--------------------------------------------------------------------------------------------------
 validation loss at iteration 134000 | lm loss value: 2.698449E+00 | lm loss PPL: 1.485667E+01 | 
--------------------------------------------------------------------------------------------------
 iteration   134200/  152972 | consumed samples:     63630784 | elapsed time per iteration (ms): 6792.7 | learning rate: 1.823E-05 | global batch size:   512 | lm loss: 2.751855E+00 | loss scale: 524288.0 | grad norm: 49151.114 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
 iteration   134400/  152972 | consumed samples:     63733184 | elapsed time per iteration (ms): 5936.0 | learning rate: 1.806E-05 | global batch size:   512 | lm loss: 2.751402E+00 | loss scale: 524288.0 | grad norm: 50151.684 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
time (ms)
saving checkpoint at iteration  134528 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
[2021-10-07 05:50:05,002] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints/global_step134528/mp_rank_00_model_states.pt
  successfully saved checkpoint at iteration  134528 to /gpfsscratch/rech/six/commun/synched_exps/tr4c-1B3-rotary-oscar/checkpoints
time (ms) | save-checkpoint: 1703.80
[exiting program after 1190.076720392704 minutes] datetime: 2021-10-07 05:50:06