vicuna-33b-v1.3-4bit-g128-awq

Vicuna is a chat assistant trained by LMSYS. This is a 4-bit AWQ quantized Vicuna v1.3 model.

AWQ is an efficient and accurate low-bit weight quantization (INT3/4) for LLMs, supporting instruction-tuned models and multi-modal LMs.

Reference

If you find AWQ useful or relevant to your research, please kindly cite the paper:

@article{lin2023awq,
  title={AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration},
  author={Lin, Ji and Tang, Jiaming and Tang, Haotian and Yang, Shang and Dang, Xingyu and Han, Song},
  journal={arXiv},
  year={2023}
}

Vicuna Model Card

Model Details

Vicuna is a chat assistant trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT.

Developed by: LMSYS
Model type: An auto-regressive language model based on the transformer architecture.
License: Non-commercial license
Finetuned from model: LLaMA.

Model Sources

Repository: https://github.com/lm-sys/FastChat
Blog: https://lmsys.org/blog/2023-03-30-vicuna/
Paper: https://arxiv.org/abs/2306.05685
Demo: https://chat.lmsys.org/