Text Generation
Transformers
English
llama
Inference Endpoints

Can't load the model

#5
by Kelheor - opened

I'm trying to use it with oobabooga / text-generation-webui
I can load default vicuna-13b, but when I'm trying to load your model, I'm getting this error:

Traceback (most recent call last):
File “C:\Users\kelhe\anaconda3\envs\textgen\lib\site-packages\transformers\modeling_utils.py”, line 442, in load_state_dict
return torch.load(checkpoint_file, map_location=“cpu”)
File “C:\Users\kelhe\anaconda3\envs\textgen\lib\site-packages\torch\serialization.py”, line 815, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File “C:\Users\kelhe\anaconda3\envs\textgen\lib\site-packages\torch\serialization.py”, line 1033, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: could not find MARK

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “C:\Users\kelhe\anaconda3\envs\textgen\lib\site-packages\transformers\modeling_utils.py”, line 446, in load_state_dict
if f.read(7) == “version”:
File “C:\Users\kelhe\anaconda3\envs\textgen\lib\codecs.py”, line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0x80 in position 28: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “C:\text-generation-webui\text-generation-webui\server.py”, line 84, in load_model_wrapper
shared.model, shared.tokenizer = load_model(shared.model_name)
File “C:\text-generation-webui\text-generation-webui\modules\models.py”, line 171, in load_model
model = AutoModelForCausalLM.from_pretrained(checkpoint, **params)
File “C:\Users\kelhe\anaconda3\envs\textgen\lib\site-packages\transformers\models\auto\auto_factory.py”, line 471, in from_pretrained
return model_class.from_pretrained(
File “C:\Users\kelhe\anaconda3\envs\textgen\lib\site-packages\transformers\modeling_utils.py”, line 2560, in from_pretrained
state_dict = load_state_dict(resolved_archive_file)
File “C:\Users\kelhe\anaconda3\envs\textgen\lib\site-packages\transformers\modeling_utils.py”, line 458, in load_state_dict
raise OSError(
OSError: Unable to load weights from pytorch checkpoint file for ‘models\reeducator_vicuna-13b-free\pytorch_model.bin’ at ‘models\reeducator_vicuna-13b-free\pytorch_model.bin’. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

I made a separate folder for your model, where I also put various files from default vicuna-13b, like config.json or tokenizer_config.json

I also renamed the model to pytorch_model.bin since your default name didn't work (got this error: OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory models\reeducator_vicuna-13b-free)

What am I doing wrong?

I managed to make it work with llama.cpp in text-generation-webui
I made folders vicuna-13b-free and vicuna-13b-free-4bit and put there downloaded models. Then I added "ggml-" to the name of each model, like ggml-vicuna-13b-free-q4_0.bin and I run it with command call python server.py --auto-devices --chat ---model vicuna-13b-free-4bit
The problem is that llama.cpp works only with CPU as I understood. Is there any way to run it on GPU?

New file vicuna-13b-free-4bit-128g.safetensors did the trick. Now it works on GPU.

Kelheor changed discussion status to closed

Sign up or log in to comment