mit-han-lab
/

vicuna-7b-v1.3-4bit-g128-awq

Text Generation

text-generation-inference

Model card Files Files and versions Community

vicuna-7b-v1.3-4bit-g128-awq / README.md

jilinmit's picture

update README

0ad2575 about 1 year ago

|

history blame contribute delete

No virus

1.33 kB

	---
	inference: false
	---

	# vicuna-7b-v1.3-4bit-g128-awq

	Vicuna is a chat assistant trained by [LMSYS](https://lmsys.org/). This is a 4-bit AWQ quantized Vicuna v1.3 model.

	[AWQ](https://github.com/mit-han-lab/llm-awq) is an efficient and accurate low-bit weight quantization (INT3/4) for LLMs, supporting instruction-tuned models and multi-modal LMs.




	## Reference
	If you find AWQ useful or relevant to your research, please kindly cite the paper:

	```bibtex
	@article{lin2023awq,
	title={AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration},
	author={Lin, Ji and Tang, Jiaming and Tang, Haotian and Yang, Shang and Dang, Xingyu and Han, Song},
	journal={arXiv},
	year={2023}
	}
	```


	## Vicuna Model Card

	### Model Details

	Vicuna is a chat assistant trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT.

	- Developed by: [LMSYS](https://lmsys.org/)
	- Model type: An auto-regressive language model based on the transformer architecture.
	- License: Non-commercial license
	- Finetuned from model: [LLaMA](https://arxiv.org/abs/2302.13971).

	#### Model Sources

	- Repository: https://github.com/lm-sys/FastChat
	- Blog: https://lmsys.org/blog/2023-03-30-vicuna/
	- Paper: https://arxiv.org/abs/2306.05685
	- Demo: https://chat.lmsys.org/