YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Quantization made by Richard Erkhov.
st-vicuna-v1.3-10.5b-taylor - GGUF
- Model creator: https://huggingface.co/nota-ai/
- Original model: https://huggingface.co/nota-ai/st-vicuna-v1.3-10.5b-taylor/
Original model description:
Shortened LLaMA Model Card
Shortened LLaMA is a depth-pruned version of LLaMA models & variants for efficient text generation.
- Developed by: Nota AI
- License: Non-commercial license
- Repository: https://github.com/Nota-NetsPresso/shortened-llm
- Paper: https://arxiv.org/abs/2402.02834
Compression Method
After identifying unimportant Transformer blocks, we perform one-shot pruning and light LoRA-based retraining.
Click to see a method figure.
Model Links
Source Model |
Pruning Ratio |
Pruning Criterion |
HF Models Link |
---|---|---|---|
LLaMA-1-7B | 20% | PPL | nota-ai/st-llama-1-5.5b-ppl |
LLaMA-1-7B | 20% | Taylor+ | nota-ai/st-llama-1-5.5b-taylor |
Vicuna-v1.3-7B | 20% | PPL | nota-ai/st-vicuna-v1.3-5.5b-ppl |
Vicuna-v1.3-7B | 20% | Taylor+ | nota-ai/st-vicuna-v1.3-5.5b-taylor |
Vicuna-v1.3-13B | 21% | PPL | nota-ai/st-vicuna-v1.3-10.5b-ppl |
Vicuna-v1.3-13B | 21% | Taylor+ | nota-ai/st-vicuna-v1.3-10.5b-taylor |
Zero-shot Performance & Efficiency Results
- EleutherAI/lm-evaluation-harness version 3326c54
License
- All rights related to this repository and the compressed models are reserved by Nota Inc.
- The intended use is strictly limited to research and non-commercial projects.
Acknowledgments
- LLM-Pruner, which utilizes LM Evaluation Harness, PEFT, and Alpaca-LoRA. Thanks for the pioneering work on structured pruning of LLMs!
- Meta AI's LLaMA and LMSYS Org's Vicuna. Thanks for the open-source LLMs!
Citation
@article{kim2024shortened,
title={Shortened LLaMA: A Simple Depth Pruning for Large Language Models},
author={Kim, Bo-Kyeong and Kim, Geonmin and Kim, Tae-Ho and Castells, Thibault and Choi, Shinkook and Shin, Junho and Song, Hyoung-Kyu},
journal={arXiv preprint arXiv:2402.02834},
year={2024},
url={https://arxiv.org/abs/2402.02834}
}
@article{kim2024mefomo,
title={Shortened LLaMA: A Simple Depth Pruning for Large Language Models},
author={Kim, Bo-Kyeong and Kim, Geonmin and Kim, Tae-Ho and Castells, Thibault and Choi, Shinkook and Shin, Junho and Song, Hyoung-Kyu},
journal={ICLR Workshop on Mathematical and Empirical Understanding of Foundation Models (ME-FoMo)},
year={2024},
url={https://openreview.net/forum?id=18VGxuOdpu}
}
- Downloads last month
- 37