mosaicml
/

mpt-7b-8k-chat

@@ -26,7 +26,7 @@ inference: false
 # MPT-7B-Chat
-MPT-7B-8k-Chat is a chatbot-like model for dialogue generation.
 It was built by finetuning [MPT-7B-8k](https://huggingface.co/mosaicml/mpt-7b-8k) on the [ShareGPT-Vicuna](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered), [Camel-AI](https://huggingface.co/camel-ai),
  [GPTeacher](https://github.com/teknium1/GPTeacher), [Guanaco](https://huggingface.co/datasets/timdettmers/openassistant-guanaco), [Baize](https://github.com/project-baize/baize-chatbot) and some generated datasets.
   * License: _CC-By-NC-SA-4.0_ (non-commercial use only)
@@ -56,7 +56,7 @@ This model is best used with the MosaicML [llm-foundry repository](https://githu
 ```python
 import transformers
 model = transformers.AutoModelForCausalLM.from_pretrained(
-  'mosaicml/mpt-7b-8k-chat',
   trust_remote_code=True
 )
 ```
@@ -69,7 +69,7 @@ To use the optimized [triton implementation](https://github.com/openai/triton) o
 import torch
 import transformers
-name = 'mosaicml/mpt-7b-8k-chat'
 config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
 config.attn_config['attn_impl'] = 'triton'  # change this to use triton-based FlashAttention
@@ -88,7 +88,7 @@ The model was trained initially with a sequence length of 2048 with an additiona
 ```python
 import transformers
-name = 'mosaicml/mpt-7b-8k-chat'
 config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
 config.max_seq_len = 16384 # (input + output) tokens can now be up to 16384
@@ -173,8 +173,8 @@ The model was trained with sharded data parallelism using [FSDP](https://pytorch
 _The following language is modified from [EleutherAI's GPT-NeoX-20B](https://huggingface.co/EleutherAI/gpt-neox-20b)_
-MPT-7B-8k-Chat can produce factually incorrect output, and should not be relied on to produce factually accurate information.
-MPT-7B-8k-Chat was trained on various public datasets.
 While great efforts have been taken to clean the pretraining data, it is possible that this model could generate lewd, biased or otherwise offensive outputs.
 ## Acknowledgements

 # MPT-7B-Chat
+MPT-7B-Chat-8k is a chatbot-like model for dialogue generation.
 It was built by finetuning [MPT-7B-8k](https://huggingface.co/mosaicml/mpt-7b-8k) on the [ShareGPT-Vicuna](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered), [Camel-AI](https://huggingface.co/camel-ai),
  [GPTeacher](https://github.com/teknium1/GPTeacher), [Guanaco](https://huggingface.co/datasets/timdettmers/openassistant-guanaco), [Baize](https://github.com/project-baize/baize-chatbot) and some generated datasets.
   * License: _CC-By-NC-SA-4.0_ (non-commercial use only)
 ```python
 import transformers
 model = transformers.AutoModelForCausalLM.from_pretrained(
+  'mosaicml/mpt-7b-chat-8k',
   trust_remote_code=True
 )
 ```
 import torch
 import transformers
+name = 'mosaicml/mpt-7b-chat-8k'
 config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
 config.attn_config['attn_impl'] = 'triton'  # change this to use triton-based FlashAttention
 ```python
 import transformers
+name = 'mosaicml/mpt-7b-chat-8k'
 config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
 config.max_seq_len = 16384 # (input + output) tokens can now be up to 16384
 _The following language is modified from [EleutherAI's GPT-NeoX-20B](https://huggingface.co/EleutherAI/gpt-neox-20b)_
+MPT-7B-Chat-8k can produce factually incorrect output, and should not be relied on to produce factually accurate information.
+MPT-7B-Chat-8k was trained on various public datasets.
 While great efforts have been taken to clean the pretraining data, it is possible that this model could generate lewd, biased or otherwise offensive outputs.
 ## Acknowledgements