Abzu
/

mpt-30b-instruct-q8

Text Generation

text-generation-inference

Inference Endpoints

8-bit precision

Model card Files Files and versions Community

Danivilanova commited on Jul 6, 2023

Commit

12d0eab

•

1 Parent(s): 3544572

Update README.md

Files changed (1) hide show

README.md +57 -0

README.md CHANGED Viewed

@@ -17,6 +17,63 @@ tags:
 inference: false
 ---
 # MPT-30B-Instruct
 MPT-30B-Instruct is a model for short-form instruction following.

 inference: false
 ---
+# MosaicML's MPT-30B-Instruct 8-bit
+These files are .safetensors format model files for [MosaicML's MPT-30B-Instruct](https://huggingface.co/mosaicml/mpt-30b-instruct).
+## How to convert
+```python
+# Load the model
+name = 'mosaicml/mpt-30b-instruct'
+config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
+config.attn_config['attn_impl'] = 'triton'  # change this to use triton-based FlashAttention
+config.init_device = 'cuda:0' # For fast initialization directly on GPU!
+start_time = time.time()
+model = transformers.AutoModelForCausalLM.from_pretrained(
+    name,
+    config=config,
+    torch_dtype=torch.bfloat16, # Load model weights in bfloat16
+    trust_remote_code=True,
+    load_in_8bit=True
+)
+# Filter the non-tensor items
+def filter_dict(dictionary):
+    filtered_dict = {key: value for key, value in dictionary.items() if "weight_format" not in key}
+    return filtered_dict
+new_state_dict = filter_dict(model.state_dict())
+# Save the 8-bit model
+model.save_pretrained('mpt-30b-instruct-8bits', state_dict=new_state_dict, safe_serialization=True)
+```
+## How to use
+```python
+# Load the model
+model = transformers.AutoModelForCausalLM.from_pretrained(
+    'mpt-30b-instruct-8bits',
+    trust_remote_code=True,
+)
+```
+## Prompt template
+```
+Below is an instruction that describes a task. Write a response that appropriately completes the request.
+### Instruction
+{prompt}
+### Response
+```
 # MPT-30B-Instruct
 MPT-30B-Instruct is a model for short-form instruction following.