yujiepan/llama-2-13b-w8a8-unstructured50

This model is w8a8 quantized & unstructually sparsified by OpenVINO, exported from meta-llama/Llama-2-13b-hf.

This model is not tuned for accuracy.

Quantization: 8-bit symmetric for weights & activations
Unstructured sparsity in transformer block linear layers: 50%

Codes for export: https://gist.github.com/yujiepan-work/1e6dd9f9c2aac0e9ecaf2ed4d82d1158

Downloads last month: 5

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.