llama-2-13b-hf-reordered

This model has QEFT Offline Global Reordering (OGR) applied. No quantization has been applied.

Reordering Configuration

Parameter Value
Base model meta-llama/Llama-2-13b-hf
Method QEFT OGR (Offline Global Reordering)
Outlier channels (k) 128
Quantization None (reordering only)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("jsyeom/reordered_models/llama-2-13b-hf-reordered")
tokenizer = AutoTokenizer.from_pretrained("jsyeom/reordered_models/llama-2-13b-hf-reordered")

Important Notes

  • o_proj: columns (input dim) are NOT reordered due to the multi-head attention structure. Only rows are reordered. (See QEFT paper Section 3.2 and Limitations.)

Reference

Downloads last month
9
Safetensors
Model size
13B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jsyeom/llama-2-13b-hf-reordered

Finetuned
(61)
this model

Paper for jsyeom/llama-2-13b-hf-reordered