|
|
|
This repo contains serialized blobs of an up projection layer of llama3-8B (oc=14336, ic=4096). |
|
The linear layer has been quantized (GPTQ W4 Sym with group size 32) and sparsified by 50%. |
|
|
|
``` |
|
βββ sparse_w4 |
|
β βββ linear_bitmap_int32.bin |
|
β βββ linear_compressed_qweight_int32.bin |
|
β βββ linear_nnz_int16.bin |
|
β βββ linear_scales_float16.bin |
|
β βββ linear_zeros_int32.bin |
|
``` |
|
|
|
### Usage |
|
The following script shows how to process the blobs in python. It shows unpacking, zero location recovery, as well as weight dequantization process. |
|
```bash |
|
python unpack_blobs.py |
|
``` |
|
|
|
> you can ignore `internal/` |