sam-mosaic commited on
Commit
2fcc9d9
1 Parent(s): fc67f07

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -26,7 +26,7 @@ inference: false
26
 
27
  # MPT-7B-Chat
28
 
29
- MPT-7B-8k-Chat is a chatbot-like model for dialogue generation.
30
  It was built by finetuning [MPT-7B-8k](https://huggingface.co/mosaicml/mpt-7b-8k) on the [ShareGPT-Vicuna](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered), [Camel-AI](https://huggingface.co/camel-ai),
31
  [GPTeacher](https://github.com/teknium1/GPTeacher), [Guanaco](https://huggingface.co/datasets/timdettmers/openassistant-guanaco), [Baize](https://github.com/project-baize/baize-chatbot) and some generated datasets.
32
  * License: _CC-By-NC-SA-4.0_ (non-commercial use only)
@@ -56,7 +56,7 @@ This model is best used with the MosaicML [llm-foundry repository](https://githu
56
  ```python
57
  import transformers
58
  model = transformers.AutoModelForCausalLM.from_pretrained(
59
- 'mosaicml/mpt-7b-8k-chat',
60
  trust_remote_code=True
61
  )
62
  ```
@@ -69,7 +69,7 @@ To use the optimized [triton implementation](https://github.com/openai/triton) o
69
  import torch
70
  import transformers
71
 
72
- name = 'mosaicml/mpt-7b-8k-chat'
73
 
74
  config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
75
  config.attn_config['attn_impl'] = 'triton' # change this to use triton-based FlashAttention
@@ -88,7 +88,7 @@ The model was trained initially with a sequence length of 2048 with an additiona
88
  ```python
89
  import transformers
90
 
91
- name = 'mosaicml/mpt-7b-8k-chat'
92
 
93
  config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
94
  config.max_seq_len = 16384 # (input + output) tokens can now be up to 16384
@@ -173,8 +173,8 @@ The model was trained with sharded data parallelism using [FSDP](https://pytorch
173
 
174
  _The following language is modified from [EleutherAI's GPT-NeoX-20B](https://huggingface.co/EleutherAI/gpt-neox-20b)_
175
 
176
- MPT-7B-8k-Chat can produce factually incorrect output, and should not be relied on to produce factually accurate information.
177
- MPT-7B-8k-Chat was trained on various public datasets.
178
  While great efforts have been taken to clean the pretraining data, it is possible that this model could generate lewd, biased or otherwise offensive outputs.
179
 
180
  ## Acknowledgements
 
26
 
27
  # MPT-7B-Chat
28
 
29
+ MPT-7B-Chat-8k is a chatbot-like model for dialogue generation.
30
  It was built by finetuning [MPT-7B-8k](https://huggingface.co/mosaicml/mpt-7b-8k) on the [ShareGPT-Vicuna](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered), [Camel-AI](https://huggingface.co/camel-ai),
31
  [GPTeacher](https://github.com/teknium1/GPTeacher), [Guanaco](https://huggingface.co/datasets/timdettmers/openassistant-guanaco), [Baize](https://github.com/project-baize/baize-chatbot) and some generated datasets.
32
  * License: _CC-By-NC-SA-4.0_ (non-commercial use only)
 
56
  ```python
57
  import transformers
58
  model = transformers.AutoModelForCausalLM.from_pretrained(
59
+ 'mosaicml/mpt-7b-chat-8k',
60
  trust_remote_code=True
61
  )
62
  ```
 
69
  import torch
70
  import transformers
71
 
72
+ name = 'mosaicml/mpt-7b-chat-8k'
73
 
74
  config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
75
  config.attn_config['attn_impl'] = 'triton' # change this to use triton-based FlashAttention
 
88
  ```python
89
  import transformers
90
 
91
+ name = 'mosaicml/mpt-7b-chat-8k'
92
 
93
  config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
94
  config.max_seq_len = 16384 # (input + output) tokens can now be up to 16384
 
173
 
174
  _The following language is modified from [EleutherAI's GPT-NeoX-20B](https://huggingface.co/EleutherAI/gpt-neox-20b)_
175
 
176
+ MPT-7B-Chat-8k can produce factually incorrect output, and should not be relied on to produce factually accurate information.
177
+ MPT-7B-Chat-8k was trained on various public datasets.
178
  While great efforts have been taken to clean the pretraining data, it is possible that this model could generate lewd, biased or otherwise offensive outputs.
179
 
180
  ## Acknowledgements