neuralmagic
/

mpt-7b-chat-pruned50-quant-ds

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

mpt-7b-chat-pruned50-quant-ds / README.md

mgoin's picture

Update README.md

a1b59e5 12 months ago

|

No virus

379 Bytes

	---
	tags:
	- deepsparse
	---
	# Sparse MPT-7B-Chat - DeepSparse

	[Chat-aligned MPT 7b model](https://huggingface.co/mosaicml/mpt-7b-chat) pruned to 50% and quantized using SparseGPT for inference with DeepSparse

	```python
	from deepsparse import TextGeneration
	model = TextGeneration(model="hf:neuralmagic/mpt-7b-chat-pruned50-quant")
	model("Tell me a joke.", max_new_tokens=50)
	```