mit-han-lab
/

vicuna-13b-v1.3-4bit-g128-awq

Text Generation

text-generation-inference

Model card Files Files and versions Community

jilinmit commited on Jul 26, 2023

Commit

0475a23

•

1 Parent(s): 36cd1ce

update README

Files changed (1) hide show

README.md +25 -26

README.md CHANGED Viewed

@@ -2,43 +2,42 @@
 inference: false
 ---
-# Vicuna Model Card
-## Model Details
-Vicuna is a chat assistant trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT.
-- **Developed by:** [LMSYS](https://lmsys.org/)
-- **Model type:** An auto-regressive language model based on the transformer architecture.
-- **License:** Non-commercial license
-- **Finetuned from model:** [LLaMA](https://arxiv.org/abs/2302.13971).
-### Model Sources
-- **Repository:** https://github.com/lm-sys/FastChat
-- **Blog:** https://lmsys.org/blog/2023-03-30-vicuna/
-- **Paper:** https://arxiv.org/abs/2306.05685
-- **Demo:** https://chat.lmsys.org/
-## Uses
-The primary use of Vicuna is research on large language models and chatbots.
-The primary intended users of the model are researchers and hobbyists in natural language processing, machine learning, and artificial intelligence.
-## How to Get Started with the Model
-- Command line interface: https://github.com/lm-sys/FastChat#vicuna-weights.
-- APIs (OpenAI API, Huggingface API): https://github.com/lm-sys/FastChat/tree/main#api.
-## Training Details
-Vicuna v1.3 is fine-tuned from LLaMA with supervised instruction fine-tuning.
-The training data is around 140K conversations collected from ShareGPT.com.
-See more details in the "Training Details of Vicuna Models" section in the appendix of this [paper](https://arxiv.org/pdf/2306.05685.pdf).
-## Evaluation
-Vicuna is evaluated with standard benchmarks, human preference, and LLM-as-a-judge. See more details in this [paper](https://arxiv.org/pdf/2306.05685.pdf) and [leaderboard](https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard).
-## Difference between different versions of Vicuna
-See [vicuna_weights_version.md](https://github.com/lm-sys/FastChat/blob/main/docs/vicuna_weights_version.md)

 inference: false
 ---
+# vicuna-13b-v1.3-4bit-g128-awq
+Vicuna is a chat assistant trained by [LMSYS](https://lmsys.org/). This is a 4-bit AWQ quantized Vicuna v1.3 model.
+[AWQ](https://github.com/mit-han-lab/llm-awq) is an **efficient and accurate** low-bit weight quantization (INT3/4) for LLMs, supporting instruction-tuned models and multi-modal LMs.
+## Reference
+If you find AWQ useful or relevant to your research, please kindly cite the paper:
+```bibtex
+@article{lin2023awq,
+  title={AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration},
+  author={Lin, Ji and Tang, Jiaming and Tang, Haotian and Yang, Shang and Dang, Xingyu and Han, Song},
+  journal={arXiv},
+  year={2023}
+}
+```
+## Vicuna Model Card
+### Model Details
+Vicuna is a chat assistant trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT.
+- **Developed by:** [LMSYS](https://lmsys.org/)
+- **Model type:** An auto-regressive language model based on the transformer architecture.
+- **License:** Non-commercial license
+- **Finetuned from model:** [LLaMA](https://arxiv.org/abs/2302.13971).
+#### Model Sources
+- **Repository:** https://github.com/lm-sys/FastChat
+- **Blog:** https://lmsys.org/blog/2023-03-30-vicuna/
+- **Paper:** https://arxiv.org/abs/2306.05685
+- **Demo:** https://chat.lmsys.org/