This is a custom INT8 version of the original BLOOM weights to make it fast to use with the DeepSpeed-Inference engine which uses Tensor Parallelism. In this repo the tensors are split into 8 shards to target 8 GPUs.
The full BLOOM documentation is here.
To use the weights in repo, you can adapt to your needs the scripts found here (XXX: they are going to migrate soon to HF Transformers code base, so will need to update the link once moved).
- Downloads last month