cgus
/

CausalLM-14B-exl2

Text Generation

text-generation-inference

Model card Files Files and versions Community

CausalLM-14B-exl2 / README.md

cgus's picture

Create README.md

44b735e 8 months ago

|

No virus

1.62 kB

	---
	base_model: CausalLM/14B
	datasets:
	- JosephusCheung/GuanacoDataset
	- Open-Orca/OpenOrca
	- stingning/ultrachat
	- meta-math/MetaMathQA
	- liuhaotian/LLaVA-Instruct-150K
	- jondurbin/airoboros-3.1
	- WizardLM/WizardLM_evol_instruct_V2_196k
	- RyokoAI/ShareGPT52K
	- RyokoAI/Fandom23K
	- milashkaarshif/MoeGirlPedia_wikitext_raw_archive
	- wikipedia
	- wiki_lingua
	- fnlp/moss-003-sft-data
	- garage-bAInd/Open-Platypus
	- LDJnr/Puffin
	- openbmb/llava_zh
	- BAAI/COIG
	- TigerResearch/tigerbot-zhihu-zh-10k
	- liwu/MNBVC
	- teknium/openhermes
	inference: false
	language:
	- en
	- zh
	license: wtfpl
	model_creator: CausalLM
	model_name: CausalLM 14B
	model_type: llama
	pipeline_tag: text-generation
	prompt_template: '<\|im_start\|>system
	{system_message}<\|im_end\|>
	<\|im_start\|>user
	{prompt}<\|im_end\|>
	<\|im_start\|>assistant
	'
	quantized_by: cgus
	tags:
	- llama
	- llama2
	---
	# CausalLM 14B - GPTQ
	- Model creator: [CausalLM](https://huggingface.co/CausalLM)
	- Original model: [CausalLM 14B](https://huggingface.co/CausalLM/14B)

	<!-- description start -->
	## Description

	Experimental exl2 quantization for CausalLM-14B for Exllamav2.
	I had some issues during quantization process, so I suspect it might have quality issues.
	3.5bpw version barely fits 12GB VRAM but has unusually high perplexity for wikitext dataset.
	I couldn't measure perplexity for 4bpw version and to compare it with TheBloke's GPTQ, so I have no idea if my quantization has issues or it supposed to be like this.

	You could try this exl2 version but I'd recommend to use [TheBloke's GPTQ](https://huggingface.co/TheBloke/CausalLM-14B-GPTQ) version instead.