Text Generation
Transformers
Safetensors
English
llama
Inference Endpoints
text-generation-inference
borzunov commited on
Commit
0cc4f70
1 Parent(s): 69185c0

Explain changes and their meaning for Petals

Browse files
Files changed (1) hide show
  1. README.md +7 -3
README.md CHANGED
@@ -14,9 +14,13 @@ pipeline_tag: text-generation
14
 
15
  This repository contains the model from the [stabilityai/StableBeluga2](https://huggingface.co/stabilityai/StableBeluga2) repository with the following changes:
16
 
17
- - Storing weights in `bfloat16` instead of `float32` (2x smaller files)
18
- - Storing weights in small shards (< 1 GiB each)
19
- - Added a [Safetensors](https://github.com/huggingface/safetensors) version
 
 
 
 
20
 
21
  We provide the original README below. Please refer there for model details and licensing information.
22
 
 
14
 
15
  This repository contains the model from the [stabilityai/StableBeluga2](https://huggingface.co/stabilityai/StableBeluga2) repository with the following changes:
16
 
17
+ 1. **Storing weights in `bfloat16` instead of `float32`.**
18
+ This leads to 2x smaller files and a small quality loss, which is not significant compared to the loss caused by NF4 quantization used in Petals by default.
19
+ 1. **Storing weights in small shards.**
20
+ Each transformer block is stored in its own shard (1.71 GB each). The input and output embeddings and adjacent layernorms are in a separate shard (1.05 GB) too.
21
+ This way, Petals clients and servers don't have to download any excess data besides the layers they actually use.
22
+ 1. **Using [Safetensors](https://github.com/huggingface/safetensors) instead of Pickle.**
23
+ This allows faster loading with smaller RAM requirements.
24
 
25
  We provide the original README below. Please refer there for model details and licensing information.
26