How to split tensors to x shards?

#1
by Ede-CH - opened

Can you provide the script of splitting original tensors into 8 shards?

If you want to perform inference, you can directly assign mp_size = 8 as a parameter of deepspeed.init_inference().

Thanks for your reply! According to my understanding, this parameter divides the model weights into eight parts based on tensor parallelism (TP) after loading the model weights. However, since the model weights have not been previously sharded based on TP, the loading time can be quite long. In the weight files provided by you, each file only saves a portion of the matrix, allowing for direct loading. Could you please provide the script for pre-sharding the weights?

Sign up or log in to comment