Error: Internal: src/sentencepiece_processor.cc in Ooba and KAI 4bit

#8
by Co0ode - opened

Hi so in both KoboldAI Occams fork and 4 bit capable ooba its both coming back with the same error. Its unclear what is causing this as the version and machine that I have not updated is still working fine but all of the machines I have tried to install and load have not worked at all. Can you please confirm what might be causing this issue? its not totally apparent to me what changed
Internal: src/sentencepiece_processor.cc

RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

From Ooba

Traceback (most recent call last):
File "/workspace/text-generation-webui/server.py", line 921, in
shared.model, shared.tokenizer = load_model(shared.model_name)
File "/workspace/text-generation-webui/modules/models.py", line 229, in load_model
tokenizer = LlamaTokenizer.from_pretrained(Path(f"{shared.args.model_dir}/{model_name}/"), clean_up_tokenization_spaces=True)
File "/workspace/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1811, in from_pretrained
return cls._from_pretrained(
File "/workspace/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1965, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/workspace/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama.py", line 96, in init
self.sp_model.Load(vocab_file)
File "/workspace/miniconda3/envs/textgen/lib/python3.10/site-packages/sentencepiece/init.py", line 905, in Load
return self.LoadFromFile(model_file)
File "/workspace/miniconda3/envs/textgen/lib/python3.10/site-packages/sentencepiece/init.py", line 310, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

From KAI

sentencepiece_processor.cc(923) LOG(ERROR) src/sentencepiece_processor.cc(290) [model_] Model is not initialized.
Returns default value 0
ERROR | main:g:609 - An error has been caught in function 'g', process 'MainProcess' (359), thread 'MainThread' (140485838542656):
Traceback (most recent call last):

File "/workspace/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/eventlet/green/thread.py", line 43, in __thread_body
func(*args, **kwargs)
β”‚ β”‚ β”” {}
β”‚ β”” ()
β”” <bound method Thread._bootstrap of <Thread(Thread-62, started daemon 140480538463168)>>
File "/workspace/KoboldAI/runtime/envs/koboldai/lib/python3.8/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
β”‚ β”” <function start_new_thread..wrap_bootstrap_inner at 0x7fc42c476c10>
β”” <Thread(Thread-62, started daemon 140480538463168)>
File "/workspace/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/eventlet/green/thread.py", line 64, in wrap_bootstrap_inner
bootstrap_inner()
β”” <bound method Thread._bootstrap_inner of <Thread(Thread-62, started daemon 140480538463168)>>
File "/workspace/KoboldAI/runtime/envs/koboldai/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
β”‚ β”” <function Thread.run at 0x7fc5669b9940>
β”” <Thread(Thread-62, started daemon 140480538463168)>
File "/workspace/KoboldAI/runtime/envs/koboldai/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
β”‚ β”‚ β”‚ β”‚ β”‚ β”” {}
β”‚ β”‚ β”‚ β”‚ β”” <Thread(Thread-62, started daemon 140480538463168)>
β”‚ β”‚ β”‚ β”” (<socketio.server.Server object at 0x7fc43000d250>, '4nVKA6iQiTr-FeLUAAAB', 'FHLMbFRf-y0wxsPJAAAA', ['load_model', {'model': ...
β”‚ β”‚ β”” <Thread(Thread-62, started daemon 140480538463168)>
β”‚ β”” <bound method Server._handle_event_internal of <socketio.server.Server object at 0x7fc43000d250>>
β”” <Thread(Thread-62, started daemon 140480538463168)>
File "/workspace/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/socketio/server.py", line 731, in _handle_event_internal
r = server._trigger_event(data[0], namespace, sid, *data[1:])
β”‚ β”‚ β”‚ β”‚ β”‚ β”” ['load_model', {'model': 'NeoCustom', 'path': '/workspace/KoboldAI/models/GPT4-X-Alpaca-30B-Int4', 'use_gpu': True, 'key': ''...
β”‚ β”‚ β”‚ β”‚ β”” '4nVKA6iQiTr-FeLUAAAB'
β”‚ β”‚ β”‚ β”” '/'
β”‚ β”‚ β”” ['load_model', {'model': 'NeoCustom', 'path': '/workspace/KoboldAI/models/GPT4-X-Alpaca-30B-Int4', 'use_gpu': True, 'key': ''...
β”‚ β”” <function Server._trigger_event at 0x7fc430e2f160>
β”” <socketio.server.Server object at 0x7fc43000d250>
File "/workspace/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/socketio/server.py", line 756, in _trigger_event
return self.handlers[namespace]event
β”‚ β”‚ β”‚ β”‚ β”” ('4nVKA6iQiTr-FeLUAAAB', {'model': 'NeoCustom', 'path': '/workspace/KoboldAI/models/GPT4-X-Alpaca-30B-Int4', 'use_gpu': True,...
β”‚ β”‚ β”‚ β”” 'load_model'
β”‚ β”‚ β”” '/'
β”‚ β”” {'/': {'get_model_info': <function get_model_info at 0x7fc42cd30430>, 'OAI_Key_Update': <function get_oai_models at 0x7fc42cd...
β”” <socketio.server.Server object at 0x7fc43000d250>
File "/workspace/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/flask_socketio/init.py", line 282, in _handler
return self._handle_event(handler, message, namespace, sid,
β”‚ β”‚ β”‚ β”‚ β”‚ β”” '4nVKA6iQiTr-FeLUAAAB'
β”‚ β”‚ β”‚ β”‚ β”” '/'
β”‚ β”‚ β”‚ β”” 'load_model'
β”‚ β”‚ β”” <function UI_2_load_model at 0x7fc42cceedc0>
β”‚ β”” <function SocketIO._handle_event at 0x7fc4309d1ee0>
β”” <flask_socketio.SocketIO object at 0x7fc43000d2b0>
File "/workspace/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/flask_socketio/init.py", line 828, in _handle_event
ret = handler(*args)
β”‚ β”” ({'model': 'NeoCustom', 'path': '/workspace/KoboldAI/models/GPT4-X-Alpaca-30B-Int4', 'use_gpu': True, 'key': '', 'gpu_layers'...
β”” <function UI_2_load_model at 0x7fc42cceedc0>

File "aiserver.py", line 609, in g
return f(*a, **k)
β”‚ β”‚ β”” {}
β”‚ β”” ({'model': 'NeoCustom', 'path': '/workspace/KoboldAI/models/GPT4-X-Alpaca-30B-Int4', 'use_gpu': True, 'key': '', 'gpu_layers'...
β”” <function UI_2_load_model at 0x7fc42cceeaf0>

File "aiserver.py", line 8960, in UI_2_load_model
load_model(use_gpu=data['use_gpu'], gpu_layers=data['gpu_layers'], disk_layers=data['disk_layers'], online_model=data['online_model'], url=koboldai_vars.colaburl, use_8_bit=data['use_8_bit'], use_4_bit=data['use_4_bit'])
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”” {'model': 'NeoCustom', 'path': '/workspace/KoboldAI/models/GPT4-X-Alpaca-30B-Int4', 'use_gpu': True, 'key': '', 'gpu_layers':...
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”” {'model': 'NeoCustom', 'path': '/workspace/KoboldAI/models/GPT4-X-Alpaca-30B-Int4', 'use_gpu': True, 'key': '', 'gpu_layers':...
β”‚ β”‚ β”‚ β”‚ β”‚ β”” <koboldai_settings.koboldai_vars object at 0x7fc42ff75610>
β”‚ β”‚ β”‚ β”‚ β”” {'model': 'NeoCustom', 'path': '/workspace/KoboldAI/models/GPT4-X-Alpaca-30B-Int4', 'use_gpu': True, 'key': '', 'gpu_layers':...
β”‚ β”‚ β”‚ β”” {'model': 'NeoCustom', 'path': '/workspace/KoboldAI/models/GPT4-X-Alpaca-30B-Int4', 'use_gpu': True, 'key': '', 'gpu_layers':...
β”‚ β”‚ β”” {'model': 'NeoCustom', 'path': '/workspace/KoboldAI/models/GPT4-X-Alpaca-30B-Int4', 'use_gpu': True, 'key': '', 'gpu_layers':...
β”‚ β”” {'model': 'NeoCustom', 'path': '/workspace/KoboldAI/models/GPT4-X-Alpaca-30B-Int4', 'use_gpu': True, 'key': '', 'gpu_layers':...
β”” <function load_model at 0x7fc42cd30af0>

File "aiserver.py", line 3231, in load_model
tokenizer = LlamaTokenizer.from_pretrained(koboldai_vars.custmodpth)
β”‚ β”‚ β”” <koboldai_settings.koboldai_vars object at 0x7fc42ff75610>
β”‚ β”” <classmethod object at 0x7fc4309d6430>
β”” <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>

File "aiserver.py", line 128, in new_pretrainedtokenizerbase_from_pretrained
tokenizer = old_pretrainedtokenizerbase_from_pretrained(cls, *args, **kwargs)
β”‚ β”‚ β”‚ β”” {}
β”‚ β”‚ β”” ('/workspace/KoboldAI/models/GPT4-X-Alpaca-30B-Int4',)
β”‚ β”” <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>
β”” <function PreTrainedTokenizerBase.from_pretrained at 0x7fc431cb7040>

File "/workspace/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1811, in from_pretrained
return cls._from_pretrained(
β”‚ β”” <classmethod object at 0x7fc431cb8040>
β”” <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>
File "/workspace/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1965, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
β”‚ β”‚ β”” {'add_bos_token': True, 'add_eos_token': False, 'bos_token': AddedToken("", rstrip=False, lstrip=False, single_word=False,...
β”‚ β”” ()
β”” <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>
File "/workspace/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/transformers/models/llama/tokenization_llama.py", line 96, in init
self.sp_model.Load(vocab_file)
β”‚ β”‚ β”‚ β”” '/workspace/KoboldAI/models/GPT4-X-Alpaca-30B-Int4/tokenizer.model'
β”‚ β”‚ β”” <function SentencePieceProcessor.Load at 0x7fc4345cca60>
β”‚ β”” <sentencepiece.SentencePieceProcessor; proxy of <Swig Object of type 'sentencepiece::SentencePieceProcessor *' at 0x7fc42c243...
β”” LlamaTokenizer(name_or_path='/workspace/KoboldAI/models/GPT4-X-Alpaca-30B-Int4', vocab_size=0, model_max_length=1000000000000...
File "/workspace/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/sentencepiece/__init__.py", line 905, in Load
return self.LoadFromFile(model_file)
β”‚ β”‚ β”” '/workspace/KoboldAI/models/GPT4-X-Alpaca-30B-Int4/tokenizer.model'
β”‚ β”” <function SentencePieceProcessor.LoadFromFile at 0x7fc4345ca5e0>
β”” <sentencepiece.SentencePieceProcessor; proxy of <Swig Object of type 'sentencepiece::SentencePieceProcessor *' at 0x7fc42c243...
File "/workspace/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/sentencepiece/__init__.py", line 310, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
β”‚ β”‚ β”‚ β”” '/workspace/KoboldAI/models/GPT4-X-Alpaca-30B-Int4/tokenizer.model'
β”‚ β”‚ β”” <sentencepiece.SentencePieceProcessor; proxy of <Swig Object of type 'sentencepiece::SentencePieceProcessor *' at 0x7fc42c243...
β”‚ β””
β”” <module 'sentencepiece._sentencepiece' from '/workspace/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/sentencepi...

RuntimeError: Internal: src/sentencepiece_processor.cc(1102) [model_proto->ParseFromArray(serialized.data(), serialized.size())]
sentencepiece_processor.cc(923) LOG(ERROR) src/sentencepiece_processor.cc(290) [model_] Model is not initialized.

Rolled back to version from 3/4 days ago and works again. Thanks

Thanks for posting that, very helpful!

I just tested with the latest version of ooba and it works. Make sure you're using the old cuda branch. More info here: https://github.com/oobabooga/text-generation-webui/blob/main/docs/GPTQ-models-(4-bit-mode).md Also, If you follow the instructions in his repo, it should work as is.

MetaIX How can it run on the most recent version and old kuda branch? Is that like an add on? Or do I have to install a second version the older version of ooba? I don't really understand how to use the old branch. If you were using it I don't see how you could also be on the latest version? Sorry I don't know to much about this. How do I get this model running when I have ooba fully updated at the moment. What is my next step. Thanks.

Sign up or log in to comment