stas commited on
Commit
fa3444f
1 Parent(s): 424857d

add DeepSpeed as another solution for ths huge model.

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -20,7 +20,8 @@ t5 = transformers.T5ForConditionalGeneration.from_pretrained('t5-11b', use_cdn =
20
  ```
21
 
22
  Secondly, a single GPU will most likely not have enough memory to even load the model into memory as the weights alone amount to over 40 GB.
23
- Model parallelism has to be used here to overcome this problem as is explained in this [PR](https://github.com/huggingface/transformers/pull/3578).
 
24
 
25
  ## [Google's T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html)
26
 
 
20
  ```
21
 
22
  Secondly, a single GPU will most likely not have enough memory to even load the model into memory as the weights alone amount to over 40 GB.
23
+ - Model parallelism has to be used here to overcome this problem as is explained in this [PR](https://github.com/huggingface/transformers/pull/3578).
24
+ - DeepSpeed's ZeRO-Offload is another approach as explained in this [post](https://github.com/huggingface/transformers/issues/9996).
25
 
26
  ## [Google's T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html)
27