nm-testing
/

SparseLLama-2-7b-ultrachat_200k-pruned_70

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

SparseLLama-2-7b-ultrachat_200k-pruned_70 / README.md

alexmarques's picture

Update README.md

a78e03f verified 3 months ago

|

No virus

2.41 kB

	---
	datasets:
	- HuggingFaceH4/ultrachat_200k
	language:
	- en
	pipeline_tag: text-generation
	---

	# SparseLlama-2-7b-ultrachat_200k-pruned_70

	## Model Overview
	- Model Architecture: Llama-2
	- Input: Text
	- Output: Text
	- Model Optimizations:
	- Pruned: 70%
	- Release Date: 6/28/2024
	- Version: 1.0
	- Model Developers: Neural Magic

	Compressed version of [Llama-2-7b](https://huggingface.co/meta-llama/Llama-2-7b-hf) specialized for text-generation.
	This model was obtained by fine-tuning the Sparse Foundational model [Sparse-Llama-2-7b-pruned_70](https://huggingface.co/nm-testing/SparseLlama-2-7b-pruned_70) on the [ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) dataset.
	It achieves a win rate of 59.8% on the [AlpacaEval](https://github.com/tatsu-lab/alpaca_eval) benchmark (version 1.0) when using [Llama-2-70b-chat](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf) as evaluator, whereas the dense [Llama-2-7b-ultrachat200k](https://huggingface.co/neuralmagic/Llama-2-7b-ultrachat200k) model achieves 57.6% win rate.

	This model was produced as part if Neural Magic's Sparse Foundational Models initiative, and demostrates the capability of Sparse Foundational Models to transfer to the text-generation domain.

	Note: This model uses the chat template from [zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta).

	## Model Optimizations

	This model is derived from the Sparse Foundational model [Sparse-Llama-2-7b-pruned_70](https://huggingface.co/nm-testing/SparseLlama-2-7b-pruned_70), which was obtained by applying the [SparseGPT](https://arxiv.org/abs/2301.00774) algorithm to prune [Llama-2-7b](https://huggingface.co/meta-llama/Llama-2-7b-hf) to 70% sparsity.
	This optimization reduces the number of parameters by 70%, reducing the disk size and FLOPs by the same level.

	## Evaluation

	This model was evaluated in the [AlpacaEval](https://github.com/tatsu-lab/alpaca_eval) benchmark using [Llama-2-70b-chat](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf) as evaluator.

	## Accuracy

	\| Model \| Win rate \| Recovery \|
	\| :----- \| :--------: \| :--------: \|
	\| [Llama-2-7b](https://huggingface.co/meta-llama/Llama-2-7b-hf) \| 3.7% \| -- \|
	\| [Llama-2-7b-ultrachat200k](https://huggingface.co/neuralmagic/Llama-2-7b-ultrachat200k) \| 57.6% \| -- \|
	\| SparseLlama-2-7b-ultrachat_200k-pruned_70 \| 59.8% \| 104% \|