sedrickkeh
commited on
Commit
•
3cfe01a
1
Parent(s):
0f18687
Update README.md
Browse files
README.md
CHANGED
@@ -10,7 +10,7 @@ tags:
|
|
10 |
language:
|
11 |
- en
|
12 |
model-index:
|
13 |
-
- name:
|
14 |
results:
|
15 |
- task:
|
16 |
type: text-generation
|
@@ -78,16 +78,17 @@ model-index:
|
|
78 |
This model was initialized from the weights of the [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) transformer model and uprained to become a linear RNN.
|
79 |
|
80 |
This is an accompanying model of our paper [Linearizing Large Language Models](), where we detail our process of converting a softmax transformer into a linear transformer, which at inference time can function as both a transformer and a recurrent model.
|
|
|
81 |
|
82 |
We uptrain Mistral-7B on 100B tokens of RefinedWeb.
|
83 |
|
84 |
|
85 |
## Model Details
|
86 |
- **Developed by**: [Toyota Research Institute](https://www.tri.global/our-work/robotics)
|
87 |
-
- **Model Type**: This is an auto-regressive language model initialized from [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) and uptrained into a linear model based on the [SUPRA](
|
88 |
- **Dataset**: Initialized from [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1). Uprained on 100B tokens of [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb).
|
89 |
- **Tokenizer**: `mistralai/Mistral-7B-v0.1`
|
90 |
-
- **Library**: [OpenLM](https://github.com/mlfoundations/open_lm/)
|
91 |
- **License**: This model is licensed under [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).
|
92 |
|
93 |
| Parameters | Hidden Size | Layers | Vocab Size | Sequence Length |
|
|
|
10 |
language:
|
11 |
- en
|
12 |
model-index:
|
13 |
+
- name: mistral-supra
|
14 |
results:
|
15 |
- task:
|
16 |
type: text-generation
|
|
|
78 |
This model was initialized from the weights of the [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) transformer model and uprained to become a linear RNN.
|
79 |
|
80 |
This is an accompanying model of our paper [Linearizing Large Language Models](), where we detail our process of converting a softmax transformer into a linear transformer, which at inference time can function as both a transformer and a recurrent model.
|
81 |
+
Our linear attention code can be found at https://github.com/TRI-ML/linear_open_lm/
|
82 |
|
83 |
We uptrain Mistral-7B on 100B tokens of RefinedWeb.
|
84 |
|
85 |
|
86 |
## Model Details
|
87 |
- **Developed by**: [Toyota Research Institute](https://www.tri.global/our-work/robotics)
|
88 |
+
- **Model Type**: This is an auto-regressive language model initialized from [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) and uptrained into a linear model based on the [SUPRA]() architecture.
|
89 |
- **Dataset**: Initialized from [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1). Uprained on 100B tokens of [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb).
|
90 |
- **Tokenizer**: `mistralai/Mistral-7B-v0.1`
|
91 |
+
- **Library**: [OpenLM](https://github.com/mlfoundations/open_lm/) (we use a [fork](https://github.com/TRI-ML/linear_open_lm/) of OpenLM that supports linear attention)
|
92 |
- **License**: This model is licensed under [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).
|
93 |
|
94 |
| Parameters | Hidden Size | Layers | Vocab Size | Sequence Length |
|