petals-team
/

StableBeluga2

@@ -14,9 +14,13 @@ pipeline_tag: text-generation
 This repository contains the model from the [stabilityai/StableBeluga2](https://huggingface.co/stabilityai/StableBeluga2) repository with the following changes:
-- Storing weights in `bfloat16` instead of `float32` (2x smaller files)
-- Storing weights in small shards (< 1 GiB each)
-- Added a [Safetensors](https://github.com/huggingface/safetensors) version
 We provide the original README below. Please refer there for model details and licensing information.

 This repository contains the model from the [stabilityai/StableBeluga2](https://huggingface.co/stabilityai/StableBeluga2) repository with the following changes:
+1. **Storing weights in `bfloat16` instead of `float32`.**
+    This leads to 2x smaller files and a small quality loss, which is not significant compared to the loss caused by NF4 quantization used in Petals by default.
+1. **Storing weights in small shards.**
+    Each transformer block is stored in its own shard (1.71 GB each). The input and output embeddings and adjacent layernorms are in a separate shard (1.05 GB) too.
+    This way, Petals clients and servers don't have to download any excess data besides the layers they actually use.
+1. **Using [Safetensors](https://github.com/huggingface/safetensors) instead of Pickle.**
+    This allows faster loading with smaller RAM requirements.
 We provide the original README below. Please refer there for model details and licensing information.