sam-mosaic
commited on
Commit
•
2fcc9d9
1
Parent(s):
fc67f07
Update README.md
Browse files
README.md
CHANGED
@@ -26,7 +26,7 @@ inference: false
|
|
26 |
|
27 |
# MPT-7B-Chat
|
28 |
|
29 |
-
MPT-7B-8k
|
30 |
It was built by finetuning [MPT-7B-8k](https://huggingface.co/mosaicml/mpt-7b-8k) on the [ShareGPT-Vicuna](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered), [Camel-AI](https://huggingface.co/camel-ai),
|
31 |
[GPTeacher](https://github.com/teknium1/GPTeacher), [Guanaco](https://huggingface.co/datasets/timdettmers/openassistant-guanaco), [Baize](https://github.com/project-baize/baize-chatbot) and some generated datasets.
|
32 |
* License: _CC-By-NC-SA-4.0_ (non-commercial use only)
|
@@ -56,7 +56,7 @@ This model is best used with the MosaicML [llm-foundry repository](https://githu
|
|
56 |
```python
|
57 |
import transformers
|
58 |
model = transformers.AutoModelForCausalLM.from_pretrained(
|
59 |
-
'mosaicml/mpt-7b-8k
|
60 |
trust_remote_code=True
|
61 |
)
|
62 |
```
|
@@ -69,7 +69,7 @@ To use the optimized [triton implementation](https://github.com/openai/triton) o
|
|
69 |
import torch
|
70 |
import transformers
|
71 |
|
72 |
-
name = 'mosaicml/mpt-7b-8k
|
73 |
|
74 |
config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
|
75 |
config.attn_config['attn_impl'] = 'triton' # change this to use triton-based FlashAttention
|
@@ -88,7 +88,7 @@ The model was trained initially with a sequence length of 2048 with an additiona
|
|
88 |
```python
|
89 |
import transformers
|
90 |
|
91 |
-
name = 'mosaicml/mpt-7b-8k
|
92 |
|
93 |
config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
|
94 |
config.max_seq_len = 16384 # (input + output) tokens can now be up to 16384
|
@@ -173,8 +173,8 @@ The model was trained with sharded data parallelism using [FSDP](https://pytorch
|
|
173 |
|
174 |
_The following language is modified from [EleutherAI's GPT-NeoX-20B](https://huggingface.co/EleutherAI/gpt-neox-20b)_
|
175 |
|
176 |
-
MPT-7B-8k
|
177 |
-
MPT-7B-8k
|
178 |
While great efforts have been taken to clean the pretraining data, it is possible that this model could generate lewd, biased or otherwise offensive outputs.
|
179 |
|
180 |
## Acknowledgements
|
|
|
26 |
|
27 |
# MPT-7B-Chat
|
28 |
|
29 |
+
MPT-7B-Chat-8k is a chatbot-like model for dialogue generation.
|
30 |
It was built by finetuning [MPT-7B-8k](https://huggingface.co/mosaicml/mpt-7b-8k) on the [ShareGPT-Vicuna](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered), [Camel-AI](https://huggingface.co/camel-ai),
|
31 |
[GPTeacher](https://github.com/teknium1/GPTeacher), [Guanaco](https://huggingface.co/datasets/timdettmers/openassistant-guanaco), [Baize](https://github.com/project-baize/baize-chatbot) and some generated datasets.
|
32 |
* License: _CC-By-NC-SA-4.0_ (non-commercial use only)
|
|
|
56 |
```python
|
57 |
import transformers
|
58 |
model = transformers.AutoModelForCausalLM.from_pretrained(
|
59 |
+
'mosaicml/mpt-7b-chat-8k',
|
60 |
trust_remote_code=True
|
61 |
)
|
62 |
```
|
|
|
69 |
import torch
|
70 |
import transformers
|
71 |
|
72 |
+
name = 'mosaicml/mpt-7b-chat-8k'
|
73 |
|
74 |
config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
|
75 |
config.attn_config['attn_impl'] = 'triton' # change this to use triton-based FlashAttention
|
|
|
88 |
```python
|
89 |
import transformers
|
90 |
|
91 |
+
name = 'mosaicml/mpt-7b-chat-8k'
|
92 |
|
93 |
config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
|
94 |
config.max_seq_len = 16384 # (input + output) tokens can now be up to 16384
|
|
|
173 |
|
174 |
_The following language is modified from [EleutherAI's GPT-NeoX-20B](https://huggingface.co/EleutherAI/gpt-neox-20b)_
|
175 |
|
176 |
+
MPT-7B-Chat-8k can produce factually incorrect output, and should not be relied on to produce factually accurate information.
|
177 |
+
MPT-7B-Chat-8k was trained on various public datasets.
|
178 |
While great efforts have been taken to clean the pretraining data, it is possible that this model could generate lewd, biased or otherwise offensive outputs.
|
179 |
|
180 |
## Acknowledgements
|