Having trouble loading this in ooba
#1
by
Efaarts
- opened
Getting this on load
Traceback (most recent call last):
File "C:\Projects\MachineLearning\LLMWebUI\text-generation-webui\modules\ui_model_menu.py", line 209, in load_model_wrapper
shared.model, shared.tokenizer = load_model(shared.model_name, loader)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Projects\MachineLearning\LLMWebUI\text-generation-webui\modules\models.py", line 93, in load_model
tokenizer = load_tokenizer(model_name, model)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Projects\MachineLearning\LLMWebUI\text-generation-webui\modules\models.py", line 113, in load_tokenizer
tokenizer = AutoTokenizer.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Projects\MachineLearning\LLMWebUI\text-generation-webui\installer_files\env\Lib\site-packages\transformers\models\auto\tokenization_auto.py", line 751, in from_pretrained
tokenizer_class = get_class_from_dynamic_module(class_ref, pretrained_model_name_or_path, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Projects\MachineLearning\LLMWebUI\text-generation-webui\installer_files\env\Lib\site-packages\transformers\dynamic_module_utils.py", line 487, in get_class_from_dynamic_module
final_module = get_cached_module_file(
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Projects\MachineLearning\LLMWebUI\text-generation-webui\installer_files\env\Lib\site-packages\transformers\dynamic_module_utils.py", line 293, in get_cached_module_file
resolved_module_file = cached_file(
^^^^^^^^^^^^
File "C:\Projects\MachineLearning\LLMWebUI\text-generation-webui\installer_files\env\Lib\site-packages\transformers\utils\hub.py", line 401, in cached_file
raise EnvironmentError(
OSError: models\Yi-34B-GiftedConvo-merged-6.0bpw-h6-exl2 does not appear to have a file named tokenization_yi.py. Checkout 'https://huggingface.co/models\Yi-34B-GiftedConvo-merged-6.0bpw-h6-exl2/None' for available files.
I had a similar error when loading the 5.0bpw in Exllamav2_HF.
But it loaded okay in Exllamav2.
Also, please make sure you're Exllamav2 is updated to the latest version.
I never load in Exllamav2_HF as I don't need any of the extra options. I tried the 8.0bpw model on my 2x4090 system and it's really slow for inference. Like 7 t/s speeds slow. The 4.0bpw model is 23 t/s and does not seem any different in terms of inference quality.