VictorSanh HF staff commited on
Commit
3bcd940
1 Parent(s): 1648537

Update readme and doc from the 80b repo

Browse files
Files changed (1) hide show
  1. README.md +6 -2
README.md CHANGED
@@ -305,11 +305,15 @@ Similarly to the base IDEFICS models, we performed checkpoint selection to stop
305
 
306
  ## Hardware
307
 
308
- The IDEFICS models were trained on an AWS SageMaker cluster using at the maximum 64 nodes of 8x80GB A100 GPUs (512 GPUs total). The cluster uses the current EFA network. IDEFICS-80b was trained for approximately 672 node hours. IDEFICS-80b-instruct was trained for approximately 3 days on 48 nodes.
 
 
 
 
309
 
310
  ## Software
311
 
312
- The training software is built on top of HuggingFace Transformers + Accelerate, and DeepSpeed ZeRO-3 for training, and [WebDataset](https://github.com/webdataset/webdataset) for data loading.
313
 
314
 
315
  # Bias, Risks, and Limitations
 
305
 
306
  ## Hardware
307
 
308
+ The IDEFICS models were trained on an AWS SageMaker cluster with 8x80GB A100 GPUs nodes and EFA network.
309
+
310
+ - IDEFICS-80B took ~28 days of training on 64 nodes (512 GPUs).
311
+ - IDEFICS-80b-instruct finetuned the base model for ~3 days on 48 nodes (384 GPUs).
312
+
313
 
314
  ## Software
315
 
316
+ The training software is built on top of HuggingFace Transformers + Accelerate, and [DeepSpeed ZeRO-3](https://github.com/microsoft/DeepSpeed) for training, and [WebDataset](https://github.com/webdataset/webdataset) for data loading.
317
 
318
 
319
  # Bias, Risks, and Limitations