Update README.md
Browse files
README.md
CHANGED
@@ -1,11 +1,13 @@
|
|
1 |
-
CodeCarbon wasn't ready until the training was over so we only did an additional 10h run to measure with and
|
2 |
|
3 |
-
This captures the startup time and 2499 iterations in 2 records, since there was also an intermediary checkpoint saved half-way and we flush the CC
|
4 |
records on each checkpoint saving.
|
5 |
|
6 |
The training had 168000 iterations. Therefore multiply the reported data by 67. This would be quite approximate since we were using 16 nodes when doing
|
7 |
the ramp up, then 64 and only the last 3 weeks 128 nodes.
|
8 |
|
|
|
|
|
9 |
Each csv file contains a report for a single gpu.
|
10 |
|
11 |
|
|
|
1 |
+
CodeCarbon wasn't ready until the training was over so we only did an additional 10h run to measure with and then we can extrapolate to the whole training.
|
2 |
|
3 |
+
This set of records captures the startup time and 2499 iterations in 2 records per gpu, since there was also an intermediary checkpoint saved half-way and we flush the CC
|
4 |
records on each checkpoint saving.
|
5 |
|
6 |
The training had 168000 iterations. Therefore multiply the reported data by 67. This would be quite approximate since we were using 16 nodes when doing
|
7 |
the ramp up, then 64 and only the last 3 weeks 128 nodes.
|
8 |
|
9 |
+
Caveat emptor: I'm not sure whether CC-reports overlap since each report is per gpu and I think they may be measuring the same thing, other than the gpu itself. So this requires research.
|
10 |
+
|
11 |
Each csv file contains a report for a single gpu.
|
12 |
|
13 |
|