vuiseng9
/

gptq-w4-gs32-sparse-compressed-oc14336-ic4096

Model card Files Files and versions Community

gptq-w4-gs32-sparse-compressed-oc14336-ic4096 / README.md

Vui Seng Chua

Add content

cfb9114 9 days ago

|

history blame contribute delete

No virus

652 Bytes


	This repo contains serialized blobs of an up projection layer of llama3-8B (oc=14336, ic=4096).
	The linear layer has been quantized (GPTQ W4 Sym with group size 32) and sparsified by 50%.

	```
	├── sparse_w4
	│ ├── linear_bitmap_int32.bin
	│ ├── linear_compressed_qweight_int32.bin
	│ ├── linear_nnz_int16.bin
	│ ├── linear_scales_float16.bin
	│ └── linear_zeros_int32.bin
	```

	### Usage
	The following script shows how to process the blobs in python. It shows unpacking, zero location recovery, as well as weight dequantization process.
	```bash
	python unpack_blobs.py
	```

	> you can ignore `internal/`