jilinmit's picture
update README
0ad2575
---
inference: false
---
# vicuna-7b-v1.3-4bit-g128-awq
Vicuna is a chat assistant trained by [LMSYS](https://lmsys.org/). This is a 4-bit AWQ quantized Vicuna v1.3 model.
[AWQ](https://github.com/mit-han-lab/llm-awq) is an **efficient and accurate** low-bit weight quantization (INT3/4) for LLMs, supporting instruction-tuned models and multi-modal LMs.
## Reference
If you find AWQ useful or relevant to your research, please kindly cite the paper:
```bibtex
@article{lin2023awq,
title={AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration},
author={Lin, Ji and Tang, Jiaming and Tang, Haotian and Yang, Shang and Dang, Xingyu and Han, Song},
journal={arXiv},
year={2023}
}
```
## Vicuna Model Card
### Model Details
Vicuna is a chat assistant trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT.
- **Developed by:** [LMSYS](https://lmsys.org/)
- **Model type:** An auto-regressive language model based on the transformer architecture.
- **License:** Non-commercial license
- **Finetuned from model:** [LLaMA](https://arxiv.org/abs/2302.13971).
#### Model Sources
- **Repository:** https://github.com/lm-sys/FastChat
- **Blog:** https://lmsys.org/blog/2023-03-30-vicuna/
- **Paper:** https://arxiv.org/abs/2306.05685
- **Demo:** https://chat.lmsys.org/