Text Generation
Transformers
Safetensors
English
llama
causal-lm
text-generation-inference
4-bit precision

some errors occured when I use the webui who can tell me why?

#13
by yogurt111 - opened

Traceback (most recent call last): File β€œ/Users/dev/WebstormProjects/text-generation-webui/server.py”, line 103, in load_model_wrapper shared.model, shared.tokenizer = load_model(shared.model_name) File β€œ/Users/dev/WebstormProjects/text-generation-webui/modules/models.py”, line 85, in load_model model = LoaderClass.from_pretrained(Path(f"{shared.args.model_dir}/{model_name}"), low_cpu_mem_usage=True, torch_dtype=torch.bfloat16 if shared.args.bf16 else torch.float16, trust_remote_code=trust_remote_code) File β€œ/usr/local/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py”, line 471, in from_pretrained return model_class.from_pretrained( File β€œ/usr/local/lib/python3.9/site-packages/transformers/modeling_utils.py”, line 2405, in from_pretrained raise EnvironmentError( OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory models/stable-vicuna-13B-GPTQ.

Please follow the instructions in the README - you need to set the GPTQ parameters for the model: bits = 4, groupsize = 128, model_type = llama. Then "save settings for this model" and "reload model"

can this webui use other models?

Yes of course - it supports most models out there. It supports unquantised fp16 HF models for GPU, the same models in 8bit on GPU, these GPTQ models in 4bit on GPU. It supports Llama models, GPT-J, OPT, GPTNeoX, RWKV, and others.

And it also supports loading ggml Llama GGML models for CPU inference.

text-gen-ui supports more models than any other UI right now I would think.

Sign up or log in to comment