MaziyarPanahi
/

WizardLM-2-8x22B-GGUF

Text Generation

4-bit precision

8-bit precision

arxiv:2304.12244

arxiv:2306.08568

arxiv:2308.09583

Inference Endpoints

text-generation-inference

Model card Files Files and versions Community

WizardLM-2-8x22B-GGUF / README.md

MaziyarPanahi's picture

add shareded model example (#5)

1ac430a verified 7 months ago

|

1.59 kB

	---
	tags:
	- quantized
	- 2-bit
	- 3-bit
	- 4-bit
	- 5-bit
	- 6-bit
	- 8-bit
	- GGUF
	- transformers
	- safetensors
	- mistral
	- text-generation
	- arxiv:2304.12244
	- arxiv:2306.08568
	- arxiv:2308.09583
	- license:apache-2.0
	- autotrain_compatible
	- endpoints_compatible
	- text-generation-inference
	- region:us
	- text-generation
	model_name: WizardLM-2-8x22B-GGUF
	base_model: microsoft/WizardLM-2-8x22B
	inference: false
	model_creator: microsoft
	pipeline_tag: text-generation
	quantized_by: MaziyarPanahi
	---
	# [MaziyarPanahi/WizardLM-2-8x22B-GGUF](https://huggingface.co/MaziyarPanahi/WizardLM-2-8x22B-GGUF)
	- Model creator: [microsoft](https://huggingface.co/microsoft)
	- Original model: [microsoft/WizardLM-2-8x22B](https://huggingface.co/microsoft/WizardLM-2-8x22B)

	## Description
	[MaziyarPanahi/WizardLM-2-8x22B-GGUF](https://huggingface.co/MaziyarPanahi/WizardLM-2-8x22B-GGUF) contains GGUF format model files for [microsoft/WizardLM-2-8x22B](https://huggingface.co/microsoft/WizardLM-2-8x22B).

	## Load sharded model

	`llama_load_model_from_file` will detect the number of files and will load additional tensors from the rest of files.

	```sh
	llama.cpp/main -m WizardLM-2-8x22B.Q2_K-00001-of-00005.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 1024 -e
	```


	## Prompt template

	```
	{system_prompt}
	USER: {prompt}
	ASSISTANT: </s>
	```

	or

	```
	A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful,
	detailed, and polite answers to the user's questions. USER: Hi ASSISTANT: Hello.</s>
	USER: {prompt} ASSISTANT: </s>......
	```