vuiseng9
/

ov-weight-quantized-llms

Model card Files Files and versions Community

ov-weight-quantized-llms / README.md

Vui Seng Chua

Revise README.md

769e2c6 7 months ago

|

history blame contribute delete

No virus

2.42 kB

	## OpenVINO Weight-Quantized LLMs

	This repo contains binary of weight quantized by [OpenVINO](https://github.com/openvinotoolkit/openvino_notebooks/blob/main/notebooks/254-llm-chatbot/254-llm-chatbot.ipynb).

	```
	\| LLM \| ratio \| group_size \|
	\|----------------- \|------- \|------------ \|
	\| llama-2-chat-7b \| 0.8 \| 128 \|
	\| mistral-7b \| 0.6 \| 64 \|
	\| gemma-2b-it \| 0.6 \| 64 \|
	```

	Notes:
	* ratio=0.8 means 80% of FC (linear) layers are 4-bit weight quantized and the rest in 8-bit.
	* group_size refers to number of elements being considered to be quantized. e.g. group size of 128 means each output channel of the weight is split into groups of 128 for quantization.
	* q_weight is in uint8 even it is 4-bit.
	* scripts are for internal use only.

	### Example usage of the saved blob
	```python
	import torch

	blob_path = "./mistral-7b_r0.6_g64.pth"

	blob = torch.load(blob_path)

	for layer, attr in blob.items():
	print(f"{layer:30} \| q_dtype: {attr['q_dtype']:5} \| orig. shape: {str(attr['original_shape']):15} \| quantized_shape: {str(attr['q_weight'].shape):15}")
	```

	```
	# Sample outputs:
	.
	.
	layers.14.mlp.gate_proj \| q_dtype: u4 \| orig. shape: (11008, 4096) \| quantized_shape: (11008, 32, 128)
	layers.14.mlp.down_proj \| q_dtype: u4 \| orig. shape: (4096, 11008) \| quantized_shape: (4096, 86, 128)
	layers.15.self_attn.k_proj \| q_dtype: u8 \| orig. shape: (4096, 4096) \| quantized_shape: (4096, 4096)
	layers.15.self_attn.v_proj \| q_dtype: u8 \| orig. shape: (4096, 4096) \| quantized_shape: (4096, 4096)
	layers.15.self_attn.q_proj \| q_dtype: u4 \| orig. shape: (4096, 4096) \| quantized_shape: (4096, 32, 128)
	layers.15.self_attn.o_proj \| q_dtype: u4 \| orig. shape: (4096, 4096) \| quantized_shape: (4096, 32, 128)
	layers.15.mlp.up_proj \| q_dtype: u4 \| orig. shape: (11008, 4096) \| quantized_shape: (11008, 32, 128)
	layers.15.mlp.gate_proj \| q_dtype: u4 \| orig. shape: (11008, 4096) \| quantized_shape: (11008, 32, 128)
	layers.15.mlp.down_proj \| q_dtype: u4 \| orig. shape: (4096, 11008) \| quantized_shape: (4096, 86, 128)
	layers.16.self_attn.k_proj \| q_dtype: u8 \| orig. shape: (4096, 4096) \| quantized_shape: (4096, 4096)
	layers.16.self_attn.v_proj \| q_dtype: u8 \| orig. shape: (4096, 4096) \| quantized_shape: (4096, 4096)
	.
	.
	```