Text Generation
Transformers
llama
Edit model card

vicuna-33b-v1.3-4bit-g128-awq

Vicuna is a chat assistant trained by LMSYS. This is a 4-bit AWQ quantized Vicuna v1.3 model.

AWQ is an efficient and accurate low-bit weight quantization (INT3/4) for LLMs, supporting instruction-tuned models and multi-modal LMs.

Reference

If you find AWQ useful or relevant to your research, please kindly cite the paper:

@article{lin2023awq,
  title={AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration},
  author={Lin, Ji and Tang, Jiaming and Tang, Haotian and Yang, Shang and Dang, Xingyu and Han, Song},
  journal={arXiv},
  year={2023}
}

Vicuna Model Card

Model Details

Vicuna is a chat assistant trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT.

  • Developed by: LMSYS
  • Model type: An auto-regressive language model based on the transformer architecture.
  • License: Non-commercial license
  • Finetuned from model: LLaMA.

Model Sources

Downloads last month
23
Inference Examples
Inference API (serverless) has been turned off for this model.