---
inference: false
---

# vicuna-7b-v1.3-4bit-g128-awq

Vicuna is a chat assistant trained by [LMSYS](https://lmsys.org/). This is a 4-bit AWQ quantized Vicuna v1.3 model.

[AWQ](https://github.com/mit-han-lab/llm-awq) is an **efficient and accurate** low-bit weight quantization (INT3/4) for LLMs, supporting instruction-tuned models and multi-modal LMs.


## Reference
If you find AWQ useful or relevant to your research, please kindly cite the paper:

```bibtex
@article{lin2023awq,
  title={AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration},
  author={Lin, Ji and Tang, Jiaming and Tang, Haotian and Yang, Shang and Dang, Xingyu and Han, Song},
  journal={arXiv},
  year={2023}
}
```


## Vicuna Model Card

### Model Details

Vicuna is a chat assistant trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT.

- **Developed by:** [LMSYS](https://lmsys.org/)
- **Model type:** An auto-regressive language model based on the transformer architecture.
- **License:** Non-commercial license
- **Finetuned from model:** [LLaMA](https://arxiv.org/abs/2302.13971).

#### Model Sources

- **Repository:** https://github.com/lm-sys/FastChat
- **Blog:** https://lmsys.org/blog/2023-03-30-vicuna/
- **Paper:** https://arxiv.org/abs/2306.05685
- **Demo:** https://chat.lmsys.org/