TheBloke/falcon-40b-instruct-GPTQ · Error when attempting to run.. Appears model files are missing or configuration issue

May 31, 2023

Good morning,

I have been trying to get the model runing can't figure out why as it states files are missing even though I downloaded all the files. Here is the error:

OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory models/falcon-40b-instruct-GPTQ.

How do you tell it to use .safetensors? Why does it not see the file by default?

Regards,

Jeff

TheBloke

Owner May 31, 2023

Did you launch text-generation-webui with the --autogptq flag? If not, please tick "AutoGPTQ" under model parameters, then "Save settings for this model" and "reload this model"

TheBloke

Owner May 31, 2023

And you're right it should be able to detect this automatically. I will discuss that with oobabooga.

jdc4429

May 31, 2023

•

edited May 31, 2023

Yes, I added the option right to the webui.py file where I have all the other options.. Should be fine in that regard.

CMD_FLAGS = '--chat --model-menu --gpu-memory 6800MiB 11000MiB 11000MiB --cpu-memory 64 --share --trust-remote-code --autogptq'

Arg.. I had compiled AutogpTQ and it compiled without errors I though for 11.7. Now when I try and compile again it's stating :

The detected CUDA version (11.5) has a minor version mismatch with the version that was used to compile PyTorch (11.7). Most likely this shouldn't be a problem.

FAILED: /home/jeff/AutoGPTQ/build/temp.linux-x86_64-3.10/autogptq_cuda/autogptq_cuda_kernel.o

I'm not sure why it's stating I'm using 11.5...

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.105.01 Driver Version: 515.105.01 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+

OK.. Nevermind.. Edited.. I did python setup.py clean and then build again and it compiled ok. let's try this again.. :)

I am still getting an error stating the module can't be found even though I did build, install ???

ModuleNotFoundError: No module named 'auto_gptq'

So I tried.. pip install auto_gptq which installed accelerate..

But I'm still getting the error...

ModuleNotFoundError: No module named 'auto_gptq'

Full error:

Traceback (most recent call last):
File "/home/jeff/oobabooga_linux/text-generation-webui/server.py", line 1087, in
shared.model, shared.tokenizer = load_model(shared.model_name)
File "/home/jeff/oobabooga_linux/text-generation-webui/modules/models.py", line 95, in load_model
output = load_func(model_name)
File "/home/jeff/oobabooga_linux/text-generation-webui/modules/models.py", line 297, in AutoGPTQ_loader
import modules.AutoGPTQ_loader
File "/home/jeff/oobabooga_linux/text-generation-webui/modules/AutoGPTQ_loader.py", line 3, in
from auto_gptq import AutoGPTQForCausalLM
ModuleNotFoundError: No module named 'auto_gptq'

jdc4429

May 31, 2023

I tried also as you suggested to select from within the interface (auto_gptq) and it does not give me an error but I am back to the problem with it not detecting the .safetensors file. :)

Traceback (most recent call last): File “/home/jeff/oobabooga_linux/text-generation-webui/server.py”, line 71, in load_model_wrapper shared.model, shared.tokenizer = load_model(shared.model_name) File “/home/jeff/oobabooga_linux/text-generation-webui/modules/models.py”, line 95, in load_model output = load_func(model_name) File “/home/jeff/oobabooga_linux/text-generation-webui/modules/models.py”, line 225, in huggingface_loader model = LoaderClass.from_pretrained(checkpoint, **params) File “/home/jeff/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py”, line 466, in from_pretrained return model_class.from_pretrained( File “/home/jeff/oobabooga_linux/installer_files/env/lib/python3.10/site-packages/transformers/modeling_utils.py”, line 2405, in from_pretrained raise EnvironmentError( OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory models/falcon-40b-instruct-GPTQ.

jdc4429

May 31, 2023

Is it possible the name of the safetensors file is causing the issue? Does it have to have .4bit at the end or something?

TheBloke

Owner May 31, 2023

This error means it's still not loading it with AutoGPTQ

If the checkbox doesn't work, can you please launch server.py with --autogptq --trust_remote_code speciflcally.

And if that doesn't work, please show me a screenshot of the contents of the TheBloke_falcon-40b-instruct-GPTQ model folder

jdc4429

May 31, 2023

Got an error stating module chardet not found. Was not in requirements. Did a pip install to fix that.
Then got an error, no module markdown. Installed via pip.

Now I get another error:

ImportError: cannot import name 'storage_ptr' from 'safetensors.torch'

jdc4429

May 31, 2023

I added the auto_gptq setting to the model settings from the interface now. I think the only issue is ooba not detecting the .safetensors file...
Which may be something to do with the filename..

But it's like 10 errors before you get something working in Ubuntu it seems so maybe not the last issue with it. lol

I have avoided downloading safetensor versions for just this reason. lol
Would it be possible for you to put the falcon 40b up in another format? :)

TheBloke

Owner May 31, 2023

Your installation is messed up or incomplete somehow. You shouldn't be needing to manually install pip packages.

Please go to the text-generation-webui directory and run: pip install -r requirements.txt

Then try again.

TheBloke

Owner May 31, 2023

•

edited May 31, 2023

The issue might be that you're using the text-gen-ui one-click-installer which I believe creates a Python conda environment. But you're not now using that environment.

If in doubt, please start again:

cd /some/folder
git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
pip install -r requirements.txt
cd ..
git clone https://github.com/PanQiWei/AutoGPTQ
cd AutoGPTQ
pip install .
cd ../text-generation-webui
mv ~/oobabooga_linux/text-generation-webui/models/TheBloke_falcon* models/
python server.py --autogptq --trust-remote-code

Also: just checking you know that you won't be able to load this model unless you have 2 x 24GB GPUs, or 1 x 48GB GPU?

jdc4429

May 31, 2023

I did run the requirements.. I just figure some things were not listed.
My other models work. In fact I just loaded the Falcon 7B without issue. (Falcon 7B takes 14gb but running with 8GB GPU + CPU)
You don't need all of it in VRAM. But I have an RTX 2070 and a K80 (not installed) Waiting on a P40 ATM. I have 72gb ram. I can run it with 30gb VRAM and cpu if it's over 30 for the rest. 4bit would not need that much.. Why I wanted to try even though it's slow atm. I plan on getting another p40 which would get me to around 54gb VRAM.. but also need to upgrade the motherboard with 3x pci16 to hold 3 cards.

I will try installing again in another directory just in case. But my only issue now might just be it doesn't recognize the .safetensor file to load it.

TheBloke

Owner May 31, 2023

Everyting you need to run text-gen-ui is in its requirements.txt. If you're having problems it's not because requirements weren't installed. It's because you installed them in a different envionrment to the one you're using, or something like that

If Falcon-7B worked then please try running Falcon 40B again, in exactly the same way, and show me everything you see on screen

jdc4429

May 31, 2023

•

edited May 31, 2023

I tried from scratch as suggested, also compiled bitsandbytes from scratch.

Get a new error..

RuntimeError: Unexpected error from hipGetDeviceCount(). Did you run some cuda
functions before calling NumHipDevices() that might have already set an error? Error
101: hipErrorInvalidDevice

When running as instructed: python server.py --autogptq --trust-remote-code

Here is the whole Traceback:

Also got some warnings.. Not sure why.

WARNING:The AutoGPTQ params are: {'model_basename': 'gptq_model-4bit--1g', 'device': 'cuda:0', 'use_triton': False, 'use_safetensors': True, 'trust_remote_code': True, 'max_memory': None}
WARNING:CUDA extension not installed.
WARNING:The safetensors archive passed at models/falcon-40b-instruct-GPTQ/gptq_model-4bit--1g.safetensors does not contain metadata. Make sure to save your model with the save_pretrained method. Defaulting to 'pt' metadata.

╭──────────────────────── Traceback (most recent call last) ────────────────────────╮
│ /home/jeff/TEST/text-generation-webui/server.py:1094 in │
│ │
│ 1091 │ │ update_model_parameters(model_settings, initial=True) # hijacking │
│ 1092 │ │ │
│ 1093 │ │ # Load the model │
│ ❱ 1094 │ │ shared.model, shared.tokenizer = load_model(shared.model_name) │
│ 1095 │ │ if shared.args.lora: │
│ 1096 │ │ │ add_lora_to_model(shared.args.lora) │
│ 1097 │
│ │
│ /home/jeff/TEST/text-generation-webui/modules/models.py:97 in load_model │
│ │
│ 94 │ else: │
│ 95 │ │ load_func = huggingface_loader │
│ 96 │ │
│ ❱ 97 │ output = load_func(model_name) │
│ 98 │ if type(output) is tuple: │
│ 99 │ │ model, tokenizer = output │
│ 100 │ else: │
│ │
│ /home/jeff/TEST/text-generation-webui/modules/models.py:299 in AutoGPTQ_loader │
│ │
│ 296 def AutoGPTQ_loader(model_name): │
│ 297 │ import modules.AutoGPTQ_loader │
│ 298 │ │
│ ❱ 299 │ return modules.AutoGPTQ_loader.load_quantized(model_name) │
│ 300 │
│ 301 │
│ 302 def get_max_memory_dict(): │
│ │
│ /home/jeff/TEST/text-generation-webui/modules/AutoGPTQ_loader.py:43 in │
│ load_quantized │
│ │
│ 40 │ } │
│ 41 │ │
│ 42 │ logger.warning(f"The AutoGPTQ params are: {params}") │
│ ❱ 43 │ model = AutoGPTQForCausalLM.from_quantized(path_to_model, **params) │
│ 44 │ return model │
│ 45 │
│ │
│ /home/jeff/anaconda3/envs/textgen/lib/python3.10/site-packages/auto_gptq/modeling │
│ /auto.py:82 in from_quantized │
│ │
│ 79 │ │ model_type = check_and_get_model_type(save_dir or model_name_or_pat │
│ 80 │ │ quant_func = GPTQ_CAUSAL_LM_MODEL_MAP[model_type].from_quantized │
│ 81 │ │ keywords = {key: kwargs[key] for key in signature(quant_func).param │
│ ❱ 82 │ │ return quant_func( │
│ 83 │ │ │ model_name_or_path=model_name_or_path, │
│ 84 │ │ │ save_dir=save_dir, │
│ 85 │ │ │ device_map=device_map, │
│ │
│ /home/jeff/anaconda3/envs/textgen/lib/python3.10/site-packages/auto_gptq/modeling │
│ /base.py:773 in from_quantized │
│ │
│ 770 │ │ if low_cpu_mem_usage: │
│ 771 │ │ │ make_sure_no_tensor_in_meta_device(model, use_triton, quantize │
│ 772 │ │ │
│ ❱ 773 │ │ accelerate.utils.modeling.load_checkpoint_in_model( │
│ 774 │ │ │ model, │
│ 775 │ │ │ checkpoint=model_save_name, │
│ 776 │ │ │ device_map=device_map, │
│ │
│ /home/jeff/anaconda3/envs/textgen/lib/python3.10/site-packages/accelerate/utils/m │
│ odeling.py:998 in load_checkpoint_in_model │
│ │
│ 995 │ buffer_names = [name for name, _ in model.named_buffers()] │
│ 996 │ │
│ 997 │ for checkpoint_file in checkpoint_files: │
│ ❱ 998 │ │ checkpoint = load_state_dict(checkpoint_file, device_map=device_ma │
│ 999 │ │ if device_map is None: │
│ 1000 │ │ │ model.load_state_dict(checkpoint, strict=False) │
│ 1001 │ │ else: │
│ │
│ /home/jeff/anaconda3/envs/textgen/lib/python3.10/site-packages/accelerate/utils/m │
│ odeling.py:859 in load_state_dict │
│ │
│ 856 │ │ │ │
│ 857 │ │ │ # if we only have one device we can load everything directly │
│ 858 │ │ │ if len(devices) == 1: │
│ ❱ 859 │ │ │ │ return safe_load_file(checkpoint_file, device=devices[0]) │
│ 860 │ │ │ │
│ 861 │ │ │ # cpu device should always exist as fallback option │
│ 862 │ │ │ if "cpu" not in devices: │
│ │
│ /home/jeff/anaconda3/envs/textgen/lib/python3.10/site-packages/safetensors/torch. │
│ py:261 in load_file │
│ │
│ 258 │ result = {} │
│ 259 │ with safe_open(filename, framework="pt", device=device) as f: │
│ 260 │ │ for k in f.keys(): │
│ ❱ 261 │ │ │ result[k] = f.get_tensor(k) │
│ 262 │ return result │
│ 263 │
│ 264 │
│ │
│ /home/jeff/anaconda3/envs/textgen/lib/python3.10/site-packages/torch/cuda/_init │
│ _.py:229 in _lazy_init │
│ │
│ 226 │ │ # are found or any other error occurs │
│ 227 │ │ if 'CUDA_MODULE_LOADING' not in os.environ: │
│ 228 │ │ │ os.environ['CUDA_MODULE_LOADING'] = 'LAZY' │
│ ❱ 229 │ │ torch._C._cuda_init() │
│ 230 │ │ # Some of the queued calls may reentrantly call _lazy_init(); │
│ 231 │ │ # we need to just return without initializing in that case. │
│ 232 │ │ # However, we must not let any other threads in! │
╰───────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Unexpected error from hipGetDeviceCount(). Did you run some cuda
functions before calling NumHipDevices() that might have already set an error? Error
101: hipErrorInvalidDevice

TheBloke

Owner May 31, 2023

Wow your install is really messed up somehow.

I really don't know what you've done to get into this position. I think hip is normally used with AMD GPUs.

I suggest making a new conda environment and starting from scratch:

Make new conda and activate it
Install the appropriate CUDA for your CUDA toolkit version. Hopefully you have CUDA Toolkit 11.x installed, in which case you can do:

pip install torch --index-url https://download.pytorch.org/whl/cu118

Then run python install -r requirements.txt in text-generation-webui again
And run pip install . in AutoGPTQ again.

Do not compile bitsandbytes from scratch. text-gen-ui installs the latest version from pip automatically.

jdc4429

May 31, 2023

This was all from a new environment..

It's just typical Linux.. fix one thing, break 10 others

I started in Linux when it was Slackware. It's been like 40 years and still the same BS.

I was on 11.7 .. I am trying to get to 11.8.. but the stupid thing never works. I have everything installed but nvidia-smi states incompatible version. But I made utilities-520 and driver 520 and installed cuda toolkit 11.8 !!! It's so annoying. So then I try mixing.. and trying 515, 510.. But of course they don't work.

So now do I try 12.1? because I pretty much figure like everything it will screw up pretty much everything that used to work in 11.7

TheBloke

Owner May 31, 2023

I wouldn't try 12.1, I've heard of weird performance bugs. If you want to go 12.x, I'd try 12.0.1. It's what I'm using

Though be aware you will then have to compile pytorch from source as there's no pre-compiled binaries for 12.x yet. So that's extra work and takes a little while -around an hour usually.

Or just go back to 11.7 if that worked OK for you

jdc4429

May 31, 2023

I'm only seeing: wget https://developer.download.nvidia.com/compute/cuda/12.0.0/local_installers/cuda_12.0.0_525.60.13_linux.run

Where did you get 12.0.1?

TheBloke

Owner May 31, 2023

https://developer.nvidia.com/cuda-12-0-1-download-archive?target_os=Linux&target_arch=x86_64

jdc4429

May 31, 2023

•

edited May 31, 2023

Thanks! I tried 12.0.0 and my RTX 2070 does not get detected. ARG.. I am back at 11.7 but will try the 12.0.1... RTX 2070 can't be that outdated already right?

I swear I searched for 12.0.1 (and 12.0 update 1) and it kept showing me 12.1!

jdc4429

May 31, 2023

It seems none of the drivers past 11.7 want to work with my RTX 2070 or else the module is not compatible with my kernel version?
Every driver past 515 gives me an error on boot that it can't find the nvidia card. Don't have the exact error. But it happened on all 12.x versions. (tried 12.0,12.0.1,12.1.1)
I guess I'm stuck on 11.7
I may have the model working now though.. But I need P40 before I can test.. It gives out of memory trying on 8gb :) Does not allow splitting with CPU I guess because of AutoGPTQ