muellerzr HF staff commited on
Commit
264a231
1 Parent(s): a21cc08
Files changed (2) hide show
  1. index.html +2 -2
  2. llm_conf.qmd +2 -2
index.html CHANGED
@@ -512,11 +512,11 @@
512
  <ul>
513
  <li>No distributed techniques at play</li>
514
  </ul></li>
515
- <li>DDP:
516
  <ul>
517
  <li>A full copy of the model exists on each device, but data is chunked between each GPU</li>
518
  </ul></li>
519
- <li>FSDP &amp; DeepSpeed:
520
  <ul>
521
  <li>Split chunks of the model and optimizer states across GPUs, allowing for training bigger models on smaller (multiple) GPUs</li>
522
  </ul></li>
 
512
  <ul>
513
  <li>No distributed techniques at play</li>
514
  </ul></li>
515
+ <li>Distributed Data Parallelism (DDP):
516
  <ul>
517
  <li>A full copy of the model exists on each device, but data is chunked between each GPU</li>
518
  </ul></li>
519
+ <li>Fully Sharded Data Parallelism (FSDP) &amp; DeepSpeed (DS):
520
  <ul>
521
  <li>Split chunks of the model and optimizer states across GPUs, allowing for training bigger models on smaller (multiple) GPUs</li>
522
  </ul></li>
llm_conf.qmd CHANGED
@@ -61,9 +61,9 @@ What can we do?
61
 
62
  * Single GPU:
63
  * No distributed techniques at play
64
- * DDP:
65
  * A full copy of the model exists on each device, but data is chunked between each GPU
66
- * FSDP & DeepSpeed:
67
  * Split chunks of the model and optimizer states across GPUs, allowing for training bigger models on smaller (multiple) GPUs
68
 
69
 
 
61
 
62
  * Single GPU:
63
  * No distributed techniques at play
64
+ * Distributed Data Parallelism (DDP):
65
  * A full copy of the model exists on each device, but data is chunked between each GPU
66
+ * Fully Sharded Data Parallelism (FSDP) & DeepSpeed (DS):
67
  * Split chunks of the model and optimizer states across GPUs, allowing for training bigger models on smaller (multiple) GPUs
68
 
69