jprafael
/

mpt-7b-instruct-sharded

Text Generation

text-generation-inference

Model card Files Files and versions Community

mpt-7b-instruct-sharded / README.md

jprafael's picture

Update README.md

7453303 about 1 year ago

|

raw history blame contribute delete

1.82 kB

	---
	license: apache-2.0
	language:
	- en
	pipeline_tag: text-generation
	inference: false
	datasets:
	- the_pile_books3
	tags:
	- mosaicML
	- sharded
	- instruct
	---

	# mpt-7b-instruct: sharded


	This is a version of the [mpt-7b-instruct](https://huggingface.co/mosaicml/mpt-7b-instruct) model, sharded to 2 GB chunks for low-RAM loading (i.e. Colab).
	The weights are stored in `bfloat16` so in theory you can run this on CPU, though it may take forever.
	Original code and credits go to [mpt-7b-storywriter-sharded](https://huggingface.co/ethzanalytics/mpt-7b-storywriter-sharded).
	See the [community discussion](https://huggingface.co/ethzanalytics/mpt-7b-storywriter-sharded/discussions/2) on how to replicate this.

	Please refer to the previously linked repo for details on usage/implementation/etc. This model was downloaded from the original repo under Apache-2.0 and is redistributed under the same license.


	## Basic Usage

	> Note when using: this is not an instruction-tuned model, so you need to give it sufficient input text to continue generating something on-topic with your prompt
	>
	Install/upgrade packages:

	```bash
	pip install -U torch transformers accelerate einops
	```

	Load the model:

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = 'jprafael/mpt-7b-instruct-sharded'
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype=torch.bfloat16,
	trust_remote_code=True,
	revision='8d8911ad980f48f8a791e5f5876dea891dcbc064', # optional, but a good idea
	device_map='auto',
	load_in_8bit=False, # install bitsandbytes then set to true for 8-bit
	)
	model = torch.compile(model)
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	```

	Then you can use `model.generate()` as you would normally - see the notebook for details.


	---