TRI-ML
/

mistral-supra

Text Generation

Model card Files Files and versions Community

sedrickkeh commited on May 13, 2024

Commit

819d065

·

verified ·

1 Parent(s): 329ccb0

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -85,7 +85,7 @@ We uptrain Mistral-7B on 100B tokens of RefinedWeb.
 ## Model Details
 - **Developed by**: [Toyota Research Institute](https://www.tri.global/our-work/robotics)
-- **Model Type**: This is an auto-regressive language model initialized from [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) and uptrained into a linear model based on the [SUPRA]() architecture.
 - **Dataset**: Initialized from [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1). Uprained on 100B tokens of [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb).
 - **Tokenizer**: `mistralai/Mistral-7B-v0.1`
 - **Library**: [OpenLM](https://github.com/mlfoundations/open_lm/) (we use a [fork](https://github.com/TRI-ML/linear_open_lm/) of OpenLM that supports linear attention)

 ## Model Details
 - **Developed by**: [Toyota Research Institute](https://www.tri.global/our-work/robotics)
+- **Model Type**: This is an auto-regressive language model initialized from [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) and uptrained into a linear model based on the [SUPRA](https://arxiv.org/abs/2405.06640) architecture.
 - **Dataset**: Initialized from [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1). Uprained on 100B tokens of [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb).
 - **Tokenizer**: `mistralai/Mistral-7B-v0.1`
 - **Library**: [OpenLM](https://github.com/mlfoundations/open_lm/) (we use a [fork](https://github.com/TRI-ML/linear_open_lm/) of OpenLM that supports linear attention)