Transformers
mpt
Composer
MosaicML
llm-foundry
text-generation-inference
TheBloke commited on
Commit
5f0a3a1
1 Parent(s): b608023

Updating model files

Browse files
Files changed (1) hide show
  1. README.md +24 -2
README.md CHANGED
@@ -8,6 +8,17 @@ tags:
8
  - llm-foundry
9
  inference: false
10
  ---
 
 
 
 
 
 
 
 
 
 
 
11
 
12
  # MPT-7B-Instruct GGML
13
 
@@ -59,13 +70,24 @@ bin/mpt -m /path/to/mpt7b-instruct.ggmlv3.q4_0.bin -t 8 -n 512 -p "Write a story
59
 
60
  Please see the ggml repo for other build options.
61
 
 
 
 
 
 
 
 
 
 
 
 
62
  # Original model card: MPT-7B-Instruct
63
 
64
 
65
  # MPT-7B-Instruct
66
 
67
  MPT-7B-Instruct is a model for short-form instruction following.
68
- It is built by finetuning [MPT-7B](https://huggingface.co/spaces/mosaicml/mpt-7b) on a [dataset](https://huggingface.co/datasets/sam-mosaic/dolly_hhrlhf) derived from the [Databricks Dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k) and the [Anthropic Helpful and Harmless (HH-RLHF)](https://huggingface.co/datasets/Anthropic/hh-rlhf) datasets.
69
  * License: _CC-By-SA-3.0_
70
  * [Demo on Hugging Face Spaces](https://huggingface.co/spaces/mosaicml/mpt-7b-instruct)
71
 
@@ -108,7 +130,7 @@ model = transformers.AutoModelForCausalLM.from_pretrained(
108
  trust_remote_code=True
109
  )
110
  ```
111
- Note: This model requires that `trust_remote_code=True` be passed to the `from_pretrained` method.
112
  This is because we use a custom `MPT` model architecture that is not yet part of the Hugging Face `transformers` package.
113
  `MPT` includes options for many training efficiency features such as [FlashAttention](https://arxiv.org/pdf/2205.14135.pdf), [ALiBi](https://arxiv.org/abs/2108.12409), [QK LayerNorm](https://arxiv.org/abs/2010.04245), and more.
114
 
 
8
  - llm-foundry
9
  inference: false
10
  ---
11
+ <div style="width: 100%;">
12
+ <img src="https://i.imgur.com/EBdldam.jpg" alt="TheBlokeAI" style="width: 100%; min-width: 400px; display: block; margin: auto;">
13
+ </div>
14
+ <div style="display: flex; justify-content: space-between; width: 100%;">
15
+ <div style="display: flex; flex-direction: column; align-items: flex-start;">
16
+ <p><a href="https://discord.gg/UBgz4VXf">Chat & support: my new Discord server</a></p>
17
+ </div>
18
+ <div style="display: flex; flex-direction: column; align-items: flex-end;">
19
+ <p><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? Patreon coming soon!</a></p>
20
+ </div>
21
+ </div>
22
 
23
  # MPT-7B-Instruct GGML
24
 
 
70
 
71
  Please see the ggml repo for other build options.
72
 
73
+ ## Want to support my work?
74
+
75
+ I've had a lot of people ask if they can contribute. I love providing models and helping people, but it is starting to rack up pretty big cloud computing bills.
76
+
77
+ So if you're able and willing to contribute, it'd be most gratefully received and will help me to keep providing models, and work on various AI projects.
78
+
79
+ Donaters will get priority support on any and all AI/LLM/model questions, and I'll gladly quantise any model you'd like to try.
80
+
81
+ * Patreon: coming soon! (just awaiting approval)
82
+ * Ko-Fi: https://ko-fi.com/TheBlokeAI
83
+ * Discord: https://discord.gg/UBgz4VXf
84
  # Original model card: MPT-7B-Instruct
85
 
86
 
87
  # MPT-7B-Instruct
88
 
89
  MPT-7B-Instruct is a model for short-form instruction following.
90
+ It is built by finetuning [MPT-7B](https://huggingface.co/spaces/mosaicml/mpt-7b) on a [dataset](https://huggingface.co/datasets/sam-mosaic/dolly_hhrlhf) derived from the [Databricks Dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k) and the [Anthropic Helpful and Harmless (HH-RLHF)](https://huggingface.co/datasets/Anthropic/hh-rlhf) datasets.
91
  * License: _CC-By-SA-3.0_
92
  * [Demo on Hugging Face Spaces](https://huggingface.co/spaces/mosaicml/mpt-7b-instruct)
93
 
 
130
  trust_remote_code=True
131
  )
132
  ```
133
+ Note: This model requires that `trust_remote_code=True` be passed to the `from_pretrained` method.
134
  This is because we use a custom `MPT` model architecture that is not yet part of the Hugging Face `transformers` package.
135
  `MPT` includes options for many training efficiency features such as [FlashAttention](https://arxiv.org/pdf/2205.14135.pdf), [ALiBi](https://arxiv.org/abs/2108.12409), [QK LayerNorm](https://arxiv.org/abs/2010.04245), and more.
136