|
--- |
|
license: mit |
|
--- |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
An unofficial reproduced PRepBN-Llama-350M checkpoints for [SLAB](https://github.com/xinghaochen/SLAB/). |
|
|
|
### Model Sources [optional] |
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
- **Repository:** [https://github.com/xinghaochen/SLAB/] |
|
- **Paper [optional]:** [https://arxiv.org/abs/2405.11582] |
|
|
|
|
|
## Evaluation |
|
|
|
<!-- This section describes the evaluation protocols and provides the results. --> |
|
|
|
https://github.com/xinghaochen/SLAB/tree/main/llama |
|
|
|
``` |
|
python evaluation.py --ckpt <checkpoint-path> |
|
``` |
|
|
|
[Results](https://github.com/xinghaochen/SLAB/blob/main/docs/llama.png) |
|
|
|
|
|
**BibTeX:** |
|
|
|
``` |
|
@inproceedings{guo2024slab, |
|
title={SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization}, |
|
author={Guo, Jialong and Chen, Xinghao and Tang, Yehui and Wang, Yunhe}, |
|
booktitle={International Conference on Machine Learning}, |
|
year={2024} |
|
} |
|
``` |