stas commited on
Commit
61f2a7d
1 Parent(s): c7e377b
Files changed (1) hide show
  1. README.md +13 -12
README.md CHANGED
@@ -15,7 +15,6 @@ Some cool model...
15
 
16
  - [Model Card for m4-80b](#model-card-for--model_id-)
17
  - [Table of Contents](#table-of-contents)
18
- - [Table of Contents](#table-of-contents-1)
19
  - [Model Details](#model-details)
20
  - [Model Description](#model-description)
21
  - [Uses](#uses)
@@ -57,15 +56,14 @@ Some cool model...
57
  <!-- Provide a longer summary of what this model is/does. -->
58
  Some cool model...
59
 
60
- - **Developed by:** More information needed
61
- - **Shared by [Optional]:** More information needed
62
- - **Model type:** Language model
63
  - **Language(s) (NLP):** en
64
  - **License:** apache-2.0
65
- - **Parent Model:** More information needed
66
  - **Resources for more information:** More information needed
67
  - [GitHub Repo](https://github.com/huggingface/m4/)
68
- - [Associated Paper](Flamingo)
69
 
70
  # Uses
71
 
@@ -172,10 +170,9 @@ More information needed
172
 
173
  Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
174
 
175
- - **Hardware Type:** More information needed
176
- - **Hours used:** More information needed
177
- - **Cloud Provider:** More information needed
178
- - **Compute Region:** More information needed
179
  - **Carbon Emitted:** unknown
180
 
181
  # Technical Specifications [optional]
@@ -190,11 +187,15 @@ More information needed
190
 
191
  ### Hardware
192
 
193
- More information needed
 
 
 
194
 
195
  ### Software
196
 
197
- More information needed
 
198
 
199
  # Citation
200
 
 
15
 
16
  - [Model Card for m4-80b](#model-card-for--model_id-)
17
  - [Table of Contents](#table-of-contents)
 
18
  - [Model Details](#model-details)
19
  - [Model Description](#model-description)
20
  - [Uses](#uses)
 
56
  <!-- Provide a longer summary of what this model is/does. -->
57
  Some cool model...
58
 
59
+ - **Developed by:** HuggingFace
60
+ - **Model type:** Multi-modal model (text+image)
 
61
  - **Language(s) (NLP):** en
62
  - **License:** apache-2.0
63
+ - **Parent Model:** [laion/CLIP-ViT-H-14-laion2B-s32B-b79K](https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K) and [huggingface/llama-65b](https://huggingface.co/huggingface/llama-65b)
64
  - **Resources for more information:** More information needed
65
  - [GitHub Repo](https://github.com/huggingface/m4/)
66
+ - Associated Paper: [Flamingo: a Visual Language Model for Few-Shot Learning](https://arxiv.org/abs/2204.14198)
67
 
68
  # Uses
69
 
 
170
 
171
  Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
172
 
173
+ - **Hardware Type:** 64 nodes of 8x 80GB A100 gpus, EFA network
174
+ - **Hours used:** ~672 node hours
175
+ - **Cloud Provider:** AWS Sagemaker
 
176
  - **Carbon Emitted:** unknown
177
 
178
  # Technical Specifications [optional]
 
187
 
188
  ### Hardware
189
 
190
+ The training was performed on AWS SageMaker cluster with 64 nodes of 8x80GB A100 GPUs (512 GPUs total). The cluster uses the current EFA network which provides about 340GBps throughput.
191
+
192
+ As the network is quite slow for the needs of DeepSpeed ZeRO-3 we were only able to clock ~90 TFLOPs.
193
+
194
 
195
  ### Software
196
 
197
+ The training software is built on top of HuggingFace Transformers + Accelerate, and DeepSpeed ZeRO-3. Plus [WebDataset](https://github.com/webdataset/webdataset) for data loading.
198
+
199
 
200
  # Citation
201