File size: 1,823 Bytes
2de6fd2
 
eb8bcad
 
 
 
 
 
9741fe8
 
 
7453303
2de6fd2
eb8bcad
7453303
eb8bcad
e19495a
7453303
 
 
 
e19495a
7453303
eb8bcad
c53c970
 
 
9741fe8
 
c53c970
 
 
9741fe8
c53c970
 
 
 
 
 
 
 
7453303
c53c970
 
 
 
7453303
c53c970
 
 
 
 
 
 
 
 
 
9741fe8
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
---
license: apache-2.0
language:
- en
pipeline_tag: text-generation
inference: false
datasets:
- the_pile_books3
tags:
- mosaicML
- sharded
- instruct
---

# mpt-7b-instruct: sharded


This is a version of the [mpt-7b-instruct](https://huggingface.co/mosaicml/mpt-7b-instruct) model, sharded to 2 GB chunks for low-RAM loading (i.e. Colab).
The weights are stored in `bfloat16` so in theory you can run this on CPU, though it may take forever.
Original code and credits go to [mpt-7b-storywriter-sharded](https://huggingface.co/ethzanalytics/mpt-7b-storywriter-sharded).
See the [community discussion](https://huggingface.co/ethzanalytics/mpt-7b-storywriter-sharded/discussions/2) on how to replicate this.

Please refer to the previously linked repo for details on usage/implementation/etc. This model was downloaded from the original repo under Apache-2.0 and is redistributed under the same license.


## Basic Usage

> Note when using: this is **not** an instruction-tuned model, so you need to give it sufficient input text to continue generating something on-topic with your prompt
> 
Install/upgrade packages:

```bash
pip install -U torch transformers accelerate einops
```

Load the model:

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = 'jprafael/mpt-7b-instruct-sharded'
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    revision='8d8911ad980f48f8a791e5f5876dea891dcbc064', # optional, but a good idea
    device_map='auto',
    load_in_8bit=False, # install bitsandbytes then set to true for 8-bit
)
model = torch.compile(model)
tokenizer = AutoTokenizer.from_pretrained(model_name)
```

Then you can use `model.generate()` as you would normally - see the notebook for details.


---