asi commited on
Commit
33320f0
1 Parent(s): ffb62ef
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -20,7 +20,7 @@ license: apache-2.0
20
  | Model name | Number of layers | Attention Heads | Embedding Dimension | Total Parameters |
21
  | :------: | :---: | :---: | :---: | :---: |
22
  | `gpt-fr-cased-small` | 12 | 12 | 768 | 124 M |
23
- | `gpt-fr-cased-base` | 24 | 14 | 1792 | 1,017 B |
24
 
25
  ## Intended uses & limitations
26
 
@@ -70,7 +70,7 @@ We created a dedicated corpus to train our generative model. Indeed the model us
70
 
71
  ## Training procedure
72
 
73
- We pre-trained the model on the new CNRS (French National Centre for Scientific Research) [Jean Zay](http://www.idris.fr/eng/jean-zay/) supercomputer. We perform the training within a total of 140 hours of computation on Tesla V-100 hardware (TDP of 300W). The training was distributed on 4 compute nodes of 8 GPUs. We used data parallelization in order to divide each micro-batch on the computing units. We estimated the total emissions at 580.61 kgCO2eq, using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in ([Lacoste et al., 2019](lacoste-2019)).
74
 
75
  ## Eval results
76
 
@@ -79,8 +79,8 @@ In line with the [WikiText](https://blog.einstein.ai/the-wikitext-long-term-depe
79
 
80
  ### BibTeX entry and citation info
81
 
82
- Along with the model host by HuggingFace transformers library, we maintain a [git repository](https://github.com/AntoineSimoulin/gpt-fr).
83
- If you use **GPT-fr** for your scientific publication or your industrial applications, please cite the following paper:
84
 
85
  ```bibtex
86
  @inproceedings{simoulin_2020_gptfr,
 
20
  | Model name | Number of layers | Attention Heads | Embedding Dimension | Total Parameters |
21
  | :------: | :---: | :---: | :---: | :---: |
22
  | `gpt-fr-cased-small` | 12 | 12 | 768 | 124 M |
23
+ | `gpt-fr-cased-base` | 24 | 14 | 1,792 | 1,017 B |
24
 
25
  ## Intended uses & limitations
26
 
 
70
 
71
  ## Training procedure
72
 
73
+ We pre-trained the model on the new CNRS (French National Centre for Scientific Research) [Jean Zay](http://www.idris.fr/eng/jean-zay/) supercomputer. We perform the training within a total of 140 hours of computation on Tesla V-100 hardware (TDP of 300W). The training was distributed on 4 compute nodes of 8 GPUs. We used data parallelization in order to divide each micro-batch on the computing units. We estimated the total emissions at 580.61 kgCO2eq, using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al., (2019)](lacoste-2019).
74
 
75
  ## Eval results
76
 
 
79
 
80
  ### BibTeX entry and citation info
81
 
82
+ Along with the model hosted by HuggingFace transformers library, we maintain a [git repository](https://github.com/AntoineSimoulin/gpt-fr).
83
+ If you use **GPT-fr** for your scientific publications or your industrial applications, please cite the following paper:
84
 
85
  ```bibtex
86
  @inproceedings{simoulin_2020_gptfr,