sedrickkeh
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -77,7 +77,7 @@ model-index:
|
|
77 |
# Mistral-SUPRA
|
78 |
This model was initialized from the weights of the [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) transformer model and up-trained into a linear RNN.
|
79 |
|
80 |
-
This is an accompanying model of our paper [Linearizing Large Language Models](), where we detail our process of converting a softmax transformer into a linear transformer, which at inference time can function as both a transformer and a recurrent model.
|
81 |
Our linear attention code can be found at https://github.com/TRI-ML/linear_open_lm/
|
82 |
|
83 |
We uptrain Mistral-7B on 100B tokens of RefinedWeb.
|
@@ -176,9 +176,8 @@ If you use this model, please cite our paper on Linearizing Large Language Model
|
|
176 |
@article{Mercat2024Linearizing,
|
177 |
title={Linearizing Large Language Models},
|
178 |
author={Jean Mercat and Igor Vasiljevic and Sedrick Keh and Kushal Arora and Achal Dave and Adrien Gaidon and Thomas Kollar},
|
179 |
-
journal={ArXiv},
|
180 |
year={2024},
|
181 |
-
|
182 |
}
|
183 |
```
|
184 |
|
|
|
77 |
# Mistral-SUPRA
|
78 |
This model was initialized from the weights of the [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) transformer model and up-trained into a linear RNN.
|
79 |
|
80 |
+
This is an accompanying model of our paper [Linearizing Large Language Models](https://arxiv.org/abs/2405.06640), where we detail our process of converting a softmax transformer into a linear transformer, which at inference time can function as both a transformer and a recurrent model.
|
81 |
Our linear attention code can be found at https://github.com/TRI-ML/linear_open_lm/
|
82 |
|
83 |
We uptrain Mistral-7B on 100B tokens of RefinedWeb.
|
|
|
176 |
@article{Mercat2024Linearizing,
|
177 |
title={Linearizing Large Language Models},
|
178 |
author={Jean Mercat and Igor Vasiljevic and Sedrick Keh and Kushal Arora and Achal Dave and Adrien Gaidon and Thomas Kollar},
|
|
|
179 |
year={2024},
|
180 |
+
journal={arXiv preprint arXiv:2405.06640},
|
181 |
}
|
182 |
```
|
183 |
|