tsumeone
/

wizard-vicuna-13b-4bit-128g-cuda

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

wizard-vicuna-13b-4bit-128g-cuda / README.md

tsumeone's picture

Update README.md

11ab1d0 over 1 year ago

|

603 Bytes

	---
	library_name: transformers
	pipeline_tag: text-generation
	---
	Quant of https://huggingface.co/junelee/wizard-vicuna-13b tested working with Occam's KoboldAI/GPTQ.

	Someone made a Triton quant already here, but it will not work with Occam's KoboldAI/GPTQ fork: https://huggingface.co/fbjr/wizard-vicuna-13b-4bit-128g

	Note that this model is fairly heavily censored (in my opinion) and delivers AI-moralizing responses to prompts that Vicuna 1.1 does not complain about.

	```python llama.py ./wizard-vicuna-13b c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors 4bit-128g.safetensors```