QEFT: Quantization for Efficient Fine-Tuning of LLMs
Paper • 2410.08661 • Published
This model has QEFT Offline Global Reordering (OGR) applied. No quantization has been applied.
| Parameter | Value |
|---|---|
| Base model | meta-llama/Llama-2-13b-hf |
| Method | QEFT OGR (Offline Global Reordering) |
| Outlier channels (k) | 128 |
| Quantization | None (reordering only) |
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("jsyeom/reordered_models/llama-2-13b-hf-reordered")
tokenizer = AutoTokenizer.from_pretrained("jsyeom/reordered_models/llama-2-13b-hf-reordered")
Base model
meta-llama/Llama-2-13b-hf