ethzanalytics
/

mpt-7b-storywriter-sharded

Text Generation

text-generation-inference

Model card Files Files and versions Community

mpt-7b-storywriter-sharded / README.md

pszemraj's picture

update commit to use for revision

26347e8 about 1 year ago

|

raw history blame contribute delete

1.81 kB

	---
	license: apache-2.0
	language:
	- en
	pipeline_tag: text-generation
	inference: false
	datasets:
	- the_pile_books3
	tags:
	- mosaicML
	- sharded
	- story
	---

	# mpt-7b-storywriter: sharded


	<a href="https://colab.research.google.com/gist/pszemraj/a979cdcc02edb916661c5dd97cf2294e/mpt-storywriter-sharded-inference.ipynb">
	<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
	</a>

	This is a version of the [mpt-7b-storywriter](https://huggingface.co/mosaicml/mpt-7b-storywriter) model, sharded to 2 GB chunks for low-RAM loading (i.e. Colab). The weights are stored in `bfloat16` so in theory you can run this on CPU, though it may take forever.

	Please refer to the previously linked repo for details on usage/implementation/etc. This model was downloaded from the original repo under Apache-2.0 and is redistributed under the same license.


	## Basic Usage

	> Note when using: this is not an instruction-tuned model, so you need to give it sufficient input text to continue generating something on-topic with your prompt
	>
	Install/upgrade packages:

	```bash
	pip install -U torch transformers accelerate einops
	```

	Load the model:

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = 'ethzanalytics/mpt-7b-storywriter-sharded'
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype=torch.bfloat16,
	trust_remote_code=True,
	revision='197d14245ad874da82194248cab1ce8cf87fa713', # optional, but a good idea
	device_map='auto',
	load_in_8bit=False, # install bitsandbytes then set to true for 8-bit
	)
	model = torch.compile(model)
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	```

	Then you can use `model.generate()` as you would normally - see the notebook for details.


	---