File size: 652 Bytes
cfb9114 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
This repo contains serialized blobs of an up projection layer of llama3-8B (oc=14336, ic=4096).
The linear layer has been quantized (GPTQ W4 Sym with group size 32) and sparsified by 50%.
```
βββ sparse_w4
β βββ linear_bitmap_int32.bin
β βββ linear_compressed_qweight_int32.bin
β βββ linear_nnz_int16.bin
β βββ linear_scales_float16.bin
β βββ linear_zeros_int32.bin
```
### Usage
The following script shows how to process the blobs in python. It shows unpacking, zero location recovery, as well as weight dequantization process.
```bash
python unpack_blobs.py
```
> you can ignore `internal/` |