TRI-ML
/

mistral-supra

Text Generation

Model card Files Files and versions Community

sedrickkeh commited on Apr 30

Commit

3cfe01a

•

1 Parent(s): 0f18687

Update README.md

Files changed (1) hide show

README.md +4 -3

README.md CHANGED Viewed

@@ -10,7 +10,7 @@ tags:
 language:
 - en
 model-index:
-- name: mamba-7b
   results:
   - task:
       type: text-generation
@@ -78,16 +78,17 @@ model-index:
 This model was initialized from the weights of the [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) transformer model and uprained to become a linear RNN.
 This is an accompanying model of our paper [Linearizing Large Language Models](), where we detail our process of converting a softmax transformer into a linear transformer, which at inference time can function as both a transformer and a recurrent model.
 We uptrain Mistral-7B on 100B tokens of RefinedWeb.
 ## Model Details
 - **Developed by**: [Toyota Research Institute](https://www.tri.global/our-work/robotics)
-- **Model Type**: This is an auto-regressive language model initialized from [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) and uptrained into a linear model based on the [SUPRA](https://arxiv.org/abs/2312.00752) architecture.
 - **Dataset**: Initialized from [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1). Uprained on 100B tokens of [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb).
 - **Tokenizer**: `mistralai/Mistral-7B-v0.1`
-- **Library**: [OpenLM](https://github.com/mlfoundations/open_lm/)
 - **License**: This model is licensed under [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).
 | Parameters | Hidden Size | Layers | Vocab Size | Sequence Length |

 language:
 - en
 model-index:
+- name: mistral-supra
   results:
   - task:
       type: text-generation
 This model was initialized from the weights of the [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) transformer model and uprained to become a linear RNN.
 This is an accompanying model of our paper [Linearizing Large Language Models](), where we detail our process of converting a softmax transformer into a linear transformer, which at inference time can function as both a transformer and a recurrent model.
+Our linear attention code can be found at https://github.com/TRI-ML/linear_open_lm/
 We uptrain Mistral-7B on 100B tokens of RefinedWeb.
 ## Model Details
 - **Developed by**: [Toyota Research Institute](https://www.tri.global/our-work/robotics)
+- **Model Type**: This is an auto-regressive language model initialized from [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) and uptrained into a linear model based on the [SUPRA]() architecture.
 - **Dataset**: Initialized from [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1). Uprained on 100B tokens of [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb).
 - **Tokenizer**: `mistralai/Mistral-7B-v0.1`
+- **Library**: [OpenLM](https://github.com/mlfoundations/open_lm/) (we use a [fork](https://github.com/TRI-ML/linear_open_lm/) of OpenLM that supports linear attention)
 - **License**: This model is licensed under [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).
 | Parameters | Hidden Size | Layers | Vocab Size | Sequence Length |