Intel
/

bert-base-uncased-sparse-90-unstructured-pruneofa

@@ -10,6 +10,9 @@ datasets:
 ## Model Details: 90% Sparse BERT-Base (uncased) Prune Once For All
 This model is a sparse pre-trained model that can be fine-tuned for a wide range of language tasks. The process of weight pruning is forcing some of the weights of the neural network to zero. Setting some of the neural network's weights to zero results in sparse matrices. Updating neural network weights does involve matrix multiplication, and if we can keep the matrices sparse while retaining enough important information, we can reduce the overall computation overhead. The "sparse" in the title of the model indicates a ratio of sparsity in the weights; for more details, you can read [Zafrir et al. (2021)](https://arxiv.org/abs/2111.05754).
 | Model Detail | Description |
 | ----------- | ----------- |
 | Model Authors - Company | Intel |
@@ -21,9 +24,6 @@ This model is a sparse pre-trained model that can be fine-tuned for a wide range
 | License | Apache 2.0 |
 | Questions or Comments | [Community Tab](https://huggingface.co/Intel/bert-base-uncased-sparse-90-unstructured-pruneofa/discussions) and [Intel Developers Discord](https://discord.gg/rv2Gp55UJQ)|
-Visualization of Prunce Once for All method from [Zafrir et al. (2021)](https://arxiv.org/abs/2111.05754). More details can be found in their paper.
-![Zafrir2021_Fig1.png](https://s3.amazonaws.com/moonup/production/uploads/6297f0e30bd2f58c647abb1d/nSDP62H9NHC1FA0C429Xo.png)
 | Intended Use | Description |
 | ----------- | ----------- |
 | Primary intended uses | This is a general sparse language model; in its current form, it is not ready for downstream prediction tasks, but it can be fine-tuned for several language tasks including (but not limited to) SQuADv1.1, QNLI, MNLI, SST-2 and QQP. |

 ## Model Details: 90% Sparse BERT-Base (uncased) Prune Once For All
 This model is a sparse pre-trained model that can be fine-tuned for a wide range of language tasks. The process of weight pruning is forcing some of the weights of the neural network to zero. Setting some of the neural network's weights to zero results in sparse matrices. Updating neural network weights does involve matrix multiplication, and if we can keep the matrices sparse while retaining enough important information, we can reduce the overall computation overhead. The "sparse" in the title of the model indicates a ratio of sparsity in the weights; for more details, you can read [Zafrir et al. (2021)](https://arxiv.org/abs/2111.05754).
+Visualization of Prunce Once For All method from [Zafrir et al. (2021)](https://arxiv.org/abs/2111.05754):
+![Zafrir2021_Fig1.png](https://s3.amazonaws.com/moonup/production/uploads/6297f0e30bd2f58c647abb1d/nSDP62H9NHC1FA0C429Xo.png)
 | Model Detail | Description |
 | ----------- | ----------- |
 | Model Authors - Company | Intel |
 | License | Apache 2.0 |
 | Questions or Comments | [Community Tab](https://huggingface.co/Intel/bert-base-uncased-sparse-90-unstructured-pruneofa/discussions) and [Intel Developers Discord](https://discord.gg/rv2Gp55UJQ)|
 | Intended Use | Description |
 | ----------- | ----------- |
 | Primary intended uses | This is a general sparse language model; in its current form, it is not ready for downstream prediction tasks, but it can be fine-tuned for several language tasks including (but not limited to) SQuADv1.1, QNLI, MNLI, SST-2 and QQP. |