sedrickkeh commited on
Commit
3cfe01a
1 Parent(s): 0f18687

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -3
README.md CHANGED
@@ -10,7 +10,7 @@ tags:
10
  language:
11
  - en
12
  model-index:
13
- - name: mamba-7b
14
  results:
15
  - task:
16
  type: text-generation
@@ -78,16 +78,17 @@ model-index:
78
  This model was initialized from the weights of the [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) transformer model and uprained to become a linear RNN.
79
 
80
  This is an accompanying model of our paper [Linearizing Large Language Models](), where we detail our process of converting a softmax transformer into a linear transformer, which at inference time can function as both a transformer and a recurrent model.
 
81
 
82
  We uptrain Mistral-7B on 100B tokens of RefinedWeb.
83
 
84
 
85
  ## Model Details
86
  - **Developed by**: [Toyota Research Institute](https://www.tri.global/our-work/robotics)
87
- - **Model Type**: This is an auto-regressive language model initialized from [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) and uptrained into a linear model based on the [SUPRA](https://arxiv.org/abs/2312.00752) architecture.
88
  - **Dataset**: Initialized from [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1). Uprained on 100B tokens of [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb).
89
  - **Tokenizer**: `mistralai/Mistral-7B-v0.1`
90
- - **Library**: [OpenLM](https://github.com/mlfoundations/open_lm/)
91
  - **License**: This model is licensed under [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).
92
 
93
  | Parameters | Hidden Size | Layers | Vocab Size | Sequence Length |
 
10
  language:
11
  - en
12
  model-index:
13
+ - name: mistral-supra
14
  results:
15
  - task:
16
  type: text-generation
 
78
  This model was initialized from the weights of the [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) transformer model and uprained to become a linear RNN.
79
 
80
  This is an accompanying model of our paper [Linearizing Large Language Models](), where we detail our process of converting a softmax transformer into a linear transformer, which at inference time can function as both a transformer and a recurrent model.
81
+ Our linear attention code can be found at https://github.com/TRI-ML/linear_open_lm/
82
 
83
  We uptrain Mistral-7B on 100B tokens of RefinedWeb.
84
 
85
 
86
  ## Model Details
87
  - **Developed by**: [Toyota Research Institute](https://www.tri.global/our-work/robotics)
88
+ - **Model Type**: This is an auto-regressive language model initialized from [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) and uptrained into a linear model based on the [SUPRA]() architecture.
89
  - **Dataset**: Initialized from [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1). Uprained on 100B tokens of [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb).
90
  - **Tokenizer**: `mistralai/Mistral-7B-v0.1`
91
+ - **Library**: [OpenLM](https://github.com/mlfoundations/open_lm/) (we use a [fork](https://github.com/TRI-ML/linear_open_lm/) of OpenLM that supports linear attention)
92
  - **License**: This model is licensed under [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).
93
 
94
  | Parameters | Hidden Size | Layers | Vocab Size | Sequence Length |