Philip May commited on
Commit
03e7c09
1 Parent(s): 974e918

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -0
README.md CHANGED
@@ -18,5 +18,8 @@ This model is too big to fit on a normal 16GB GPU in FP32 mode.
18
  For various reasons, T5 models cannot be trained in FP16 mode.
19
  However, mixed precision training is not yet supported on many GPUs.
20
  For example, it does not work on V100 GPUs. On A100, however, it does.
 
21
  That is why we suggest to use [DeepSpeed](https://github.com/microsoft/DeepSpeed) for training.
22
  In particular, we recommend the [ZeRO-3 Example](https://huggingface.co/docs/transformers/main_classes/deepspeed#zero3-example) `auto` configuration.
 
 
 
18
  For various reasons, T5 models cannot be trained in FP16 mode.
19
  However, mixed precision training is not yet supported on many GPUs.
20
  For example, it does not work on V100 GPUs. On A100, however, it does.
21
+
22
  That is why we suggest to use [DeepSpeed](https://github.com/microsoft/DeepSpeed) for training.
23
  In particular, we recommend the [ZeRO-3 Example](https://huggingface.co/docs/transformers/main_classes/deepspeed#zero3-example) `auto` configuration.
24
+
25
+ > ZeRO-Offload pushes the boundary of the maximum model size that can be trained efficiently using minimal GPU resources, by exploiting computational and memory resources on both GPUs and their host CPUs. see [ZeRO-Offload](https://www.deepspeed.ai/features/#zero-offload)