Add documentation items
Browse files
README.md
CHANGED
@@ -73,7 +73,7 @@ We created a dedicated corpus to train our generative model. Indeed the model us
|
|
73 |
|
74 |
## Training procedure
|
75 |
|
76 |
-
We pre-trained the model on the new CNRS (French National Centre for Scientific Research) [Jean Zay](http://www.idris.fr/eng/jean-zay/) supercomputer. We perform the training within a total of 140 hours of computation on Tesla V-100 hardware (TDP of 300W). The training was distributed on 4 compute nodes of 8 GPUs. We used data parallelization in order to divide each micro-batch on the computing units. We estimated the total emissions at 580.61 kgCO2eq, using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in (Lacoste et al., 2019).
|
77 |
|
78 |
## Eval results
|
79 |
|
@@ -95,3 +95,6 @@ In line with the [WikiText](https://blog.einstein.ai/the-wikitext-long-term-depe
|
|
95 |
><div name="li-2019">Xian Li, Paul Michel, Antonios Anastasopoulos, Yonatan Belinkov, Nadir Durrani, Orhan Firat, Philipp Koehn, Graham Neubig, Juan Pino, Hassan Sajjad: Findings of the First Shared Task on Machine Translation Robustness. WMT (2) 2019: 91-102</div>
|
96 |
|
97 |
><div name="shoeybi-2019">Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, Bryan Catanzaro: Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. CoRR abs/1909.08053 (2019)</div>
|
|
|
|
|
|
|
|
73 |
|
74 |
## Training procedure
|
75 |
|
76 |
+
We pre-trained the model on the new CNRS (French National Centre for Scientific Research) [Jean Zay](http://www.idris.fr/eng/jean-zay/) supercomputer. We perform the training within a total of 140 hours of computation on Tesla V-100 hardware (TDP of 300W). The training was distributed on 4 compute nodes of 8 GPUs. We used data parallelization in order to divide each micro-batch on the computing units. We estimated the total emissions at 580.61 kgCO2eq, using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in ([Lacoste et al., 2019](lacoste-2019)).
|
77 |
|
78 |
## Eval results
|
79 |
|
|
|
95 |
><div name="li-2019">Xian Li, Paul Michel, Antonios Anastasopoulos, Yonatan Belinkov, Nadir Durrani, Orhan Firat, Philipp Koehn, Graham Neubig, Juan Pino, Hassan Sajjad: Findings of the First Shared Task on Machine Translation Robustness. WMT (2) 2019: 91-102</div>
|
96 |
|
97 |
><div name="shoeybi-2019">Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, Bryan Catanzaro: Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. CoRR abs/1909.08053 (2019)</div>
|
98 |
+
|
99 |
+
><div name="lacoste-2019">Alexandre Lacoste, Alexandra Luccioni, Victor Schmidt, Thomas Dandres: Quantifying the Carbon Emissions of Machine Learning. CoRR abs/1910.09700 (2019)</div>
|
100 |
+
|