This is a copy of the original 🌸 BLOOMChat weights that is more efficient to use with the DeepSpeed-Inference 🚀. In this repo the original tensors are split into 8 shards to target 8 GPUs, this allows the user to run the model with DeepSpeed-inference Tensor Parallelism.

For specific details about the BLOOMChat model itself, please see the original BLOOMChat model card.

This work was performed using AI/HPC resources (Jean Zay supercomputer) from GENCI-IDRIS

Downloads last month
8
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.