Handle model parallelism

#4
by sgugger - opened

With this added line (similar to many models in Transformers), this model will work with device_map="auto" during training.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment