How to implement model parallelism in gpt-neox?

#5
by sunyt32 - opened

In gpt-j-6b, the code contains "parallelize" function which helps to parallelize the model conveniently. It seems difficult to put the entire model in one GPU. Or there is a more fancy way to do that?

EleutherAI org

This model does not fit on one GPU unless you have an 80 GB A100, an A40, or an A6000. It will just barely fit for running inference in 48 GB and does not fit in 40 GB.

stellaathena changed discussion status to closed

Sign up or log in to comment