sedrickkeh commited on
Commit
329ccb0
1 Parent(s): ab4dcf3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -3
README.md CHANGED
@@ -77,7 +77,7 @@ model-index:
77
  # Mistral-SUPRA
78
  This model was initialized from the weights of the [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) transformer model and up-trained into a linear RNN.
79
 
80
- This is an accompanying model of our paper [Linearizing Large Language Models](), where we detail our process of converting a softmax transformer into a linear transformer, which at inference time can function as both a transformer and a recurrent model.
81
  Our linear attention code can be found at https://github.com/TRI-ML/linear_open_lm/
82
 
83
  We uptrain Mistral-7B on 100B tokens of RefinedWeb.
@@ -176,9 +176,8 @@ If you use this model, please cite our paper on Linearizing Large Language Model
176
  @article{Mercat2024Linearizing,
177
  title={Linearizing Large Language Models},
178
  author={Jean Mercat and Igor Vasiljevic and Sedrick Keh and Kushal Arora and Achal Dave and Adrien Gaidon and Thomas Kollar},
179
- journal={ArXiv},
180
  year={2024},
181
- volume={},
182
  }
183
  ```
184
 
 
77
  # Mistral-SUPRA
78
  This model was initialized from the weights of the [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) transformer model and up-trained into a linear RNN.
79
 
80
+ This is an accompanying model of our paper [Linearizing Large Language Models](https://arxiv.org/abs/2405.06640), where we detail our process of converting a softmax transformer into a linear transformer, which at inference time can function as both a transformer and a recurrent model.
81
  Our linear attention code can be found at https://github.com/TRI-ML/linear_open_lm/
82
 
83
  We uptrain Mistral-7B on 100B tokens of RefinedWeb.
 
176
  @article{Mercat2024Linearizing,
177
  title={Linearizing Large Language Models},
178
  author={Jean Mercat and Igor Vasiljevic and Sedrick Keh and Kushal Arora and Achal Dave and Adrien Gaidon and Thomas Kollar},
 
179
  year={2024},
180
+ journal={arXiv preprint arXiv:2405.06640},
181
  }
182
  ```
183