alexmarques commited on
Commit
0dfc961
·
verified ·
1 Parent(s): c590d1d

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -0
README.md ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - vllm
4
+ - sparsity
5
+ pipeline_tag: text-generation
6
+ license: llama3.1
7
+ base_model: neuralmagic/Sparse-Llama-3.1-8B-2of4
8
+ ---
9
+
10
+ # Sparse-Llama-3.1-8B-ultrachat_200k-2of4
11
+
12
+ ## Model Overview
13
+ - **Model Architecture:** Llama-3.1-8B
14
+ - **Input:** Text
15
+ - **Output:** Text
16
+ - **Model Optimizations:**
17
+ - **Sparsity:** 2:4
18
+ - **Release Date:** 11/21/2024
19
+ - **Version:** 1.0
20
+ - **License(s):** [llama3.1](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B/blob/main/LICENSE)
21
+ - **Model Developers:** Neural Magic
22
+
23
+ This is a multi-turn conversational AI model obtained by fine-tuning the 2:4 sparse [Sparse-Llama-3.1-8B-2of4](https://huggingface.co/neuralmagic/Sparse-Llama-3.1-8B-2of4) on the [ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) dataset.
24
+ On the [AlpacaEval](https://github.com/tatsu-lab/alpaca_eval) benchmark (version 1), it achieves a score of 61.1, compared to 62.0 for the fine-tuned dense model [Llama-3.1-8B-ultrachat_200k](https://huggingface.co/neuralmagic/Llama-3.1-8B-ultrachat_200k) — demonstrating a **99.4% accuracy recovery**.
25
+
26
+
27
+ ### Model Optimizations
28
+
29
+ This inherits the optimizations from its parent, [Sparse-Llama-3.1-8B-2of4](https://huggingface.co/neuralmagic/Sparse-Llama-3.1-8B-2of4).
30
+ Namely, all linear operators within transformer blocks were pruned to the 2:4 sparsity pattern: in each group of four weights, two are retained while two are pruned.
31
+
32
+
33
+ ## Deployment with vLLM
34
+
35
+ This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/latest/) backend. vLLM aslo supports OpenAI-compatible serving. See the [documentation](https://docs.vllm.ai/en/latest/) for more details.
36
+
37
+
38
+ ## Evaluation
39
+
40
+ This model was evaluated on Neural Magic's fork of [AlpacaEval](https://github.com/neuralmagic/alpaca_eval) benchmark.
41
+ We adopt the same setup as in [Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment](https://arxiv.org/abs/2405.03594), using version 1 of the benchmark and [Llama-2-70b-chat](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf) as the annotator.
42
+
43
+ ### Accuracy
44
+ #### AlpacaEval Benchmark
45
+ <table>
46
+ <tr>
47
+ <td><strong>Metric</strong></td>
48
+ <td style="text-align: center"><strong>Llama-3.1-8B-ultrachat_200k</strong></td>
49
+ <td style="text-align: center"><strong>Sparse-Llama-3.1-8B-ultrachat_200k-2of4</strong></td>
50
+ </tr>
51
+ <tr>
52
+ <td>Win rate</td>
53
+ <td style="text-align: center">62.0</td>
54
+ <td style="text-align: center">61.1</td>
55
+ </tr>
56
+ </table>
57
+