10h run README
Browse files
README.md
ADDED
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
CodeCarbon wasn't ready until the training was over so we only did an additional 10h run to measure with and the to extrapolate to the whole training.
|
2 |
+
|
3 |
+
This captures the startup time and 2499 iterations in 2 records, since there was also an intermediary checkpoint saved half-way and we flush the CC
|
4 |
+
records on each checkpoint saving.
|
5 |
+
|
6 |
+
The training had 168000 iterations. Therefore multiply the reported data by 67. This would be quite approximate since we were using 16 nodes when doing
|
7 |
+
the ramp up, then 64 and only the last 3 weeks 128 nodes.
|
8 |
+
|
9 |
+
Each csv file contains a report for a single gpu.
|
10 |
+
|
11 |
+
|
12 |
+
|