This repo contains serialized blobs of an up projection layer of llama3-8B (oc=14336, ic=4096). The linear layer has been quantized (GPTQ W4 Sym with group size 32) and sparsified by 50%. ``` ├── sparse_w4 │ ├── linear_bitmap_int32.bin │ ├── linear_compressed_qweight_int32.bin │ ├── linear_nnz_int16.bin │ ├── linear_scales_float16.bin │ └── linear_zeros_int32.bin ``` ### Usage The following script shows how to process the blobs in python. It shows unpacking, zero location recovery, as well as weight dequantization process. ```bash python unpack_blobs.py ``` > you can ignore `internal/`