Why is this model demanding that I set trust remote code to true?

#1
by Nafnlaus - opened

And where is this remote code?

This model is created when the falcon code was still remote and not native itegrated to transformers library.
You can ignore the trust remote code, with the latest release it should still work.

The remote code is pointing to the original repo of the model

I can't ignore it - I can't run it without enabling trust_remote_code.

Then please open an issue at transformers github, this is not model specific

Unfortunately it does appear to be model-specific. I looked at the transformers code. It's looking for auto_map with AutoConfig in the config to decide if there's remote code.

    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
    has_remote_code = "auto_map" in config_dict and "AutoConfig" in config_dict["auto_map"]

You have an auto_map section with AutoConfig in config.json:

"auto_map": {
"AutoConfig": "OpenAssistant/falcon-40b-sft-mix-1226--configuration_RW.RWConfig",
"AutoModel": "OpenAssistant/falcon-40b-sft-mix-1226--modelling_RW.RWModel",
"AutoModelForCausalLM": "OpenAssistant/falcon-40b-sft-mix-1226--modelling_RW.RWForCausalLM",
"AutoModelForQuestionAnswering": "OpenAssistant/falcon-40b-sft-mix-1226--modelling_RW.RWForQuestionAnswering",
"AutoModelForSequenceClassification": "OpenAssistant/falcon-40b-sft-mix-1226--modelling_RW.RWForSequenceClassification",
"AutoModelForTokenClassification": "OpenAssistant/falcon-40b-sft-mix-1226--modelling_RW.RWForTokenClassification"
},

Just deleted this section, try again please

[meme@chmmr text-generation-webui]$ python server.py --model models/flozi00_OpenAssistant-falcon-40B-4-bits-autogptq --listen --verbose --api --xformers --n-gpu-layers 10000000000 --loader exllama --max_seq_len 2048
[2023-10-06 20:49:34,668] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2023-10-06 20:49:35.080375: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-10-06 20:49:35.743046: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2023-10-06 20:49:36 INFO:Loading settings from settings.json...
2023-10-06 20:49:36 INFO:Loading flozi00_OpenAssistant-falcon-40B-4-bits-autogptq...
Traceback (most recent call last):
File "/home/user/text-generation-webui/server.py", line 222, in
shared.model, shared.tokenizer = load_model(model_name)
File "/home/user/text-generation-webui/modules/models.py", line 79, in load_model
output = load_func_maploader
File "/home/user/text-generation-webui/modules/models.py", line 326, in ExLlama_loader
model, tokenizer = ExllamaModel.from_pretrained(model_name)
File "/home/user/text-generation-webui/modules/exllama.py", line 55, in from_pretrained
config = ExLlamaConfig(str(model_config_path))
File "/home/user/.local/lib/python3.10/site-packages/exllama/model.py", line 56, in init
self.intermediate_size = read_config["intermediate_size"]
KeyError: 'intermediate_size'

[text-generation-webui]$ python server.py --model models/flozi00_OpenAssistant-falcon-40B-4-bits-autogptq --listen --verbose --api --xformers --n-gpu-layers 10000000000 --loader autogptq --triton
[2023-10-06 20:50:02,122] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2023-10-06 20:50:02.528940: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-10-06 20:50:03.190506: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2023-10-06 20:50:04 INFO:Loading settings from settings.json...
2023-10-06 20:50:04 INFO:Loading flozi00_OpenAssistant-falcon-40B-4-bits-autogptq...
2023-10-06 20:50:04 INFO:The AutoGPTQ params are: {'model_basename': 'model', 'device': 'cuda:0', 'use_triton': True, 'inject_fused_attention': True, 'inject_fused_mlp': True, 'use_safetensors': True, 'trust_remote_code': False, 'max_memory': None, 'quantize_config': None, 'use_cuda_fp16': True, 'disable_exllama': False}
Traceback (most recent call last):
File "/home/user/text-generation-webui/server.py", line 222, in
shared.model, shared.tokenizer = load_model(model_name)
File "/home/user/text-generation-webui/modules/models.py", line 79, in load_model
output = load_func_maploader
File "/home/user/text-generation-webui/modules/models.py", line 320, in AutoGPTQ_loader
return modules.AutoGPTQ_loader.load_quantized(model_name)
File "/home/user/text-generation-webui/modules/AutoGPTQ_loader.py", line 57, in load_quantized
model = AutoGPTQForCausalLM.from_quantized(path_to_model, **params)
File "/home/user/.local/lib/python3.10/site-packages/auto_gptq/modeling/auto.py", line 87, in from_quantized
model_type = check_and_get_model_type(model_name_or_path, trust_remote_code)
File "/home/user/.local/lib/python3.10/site-packages/auto_gptq/modeling/_utils.py", line 147, in check_and_get_model_type
config = AutoConfig.from_pretrained(model_dir, trust_remote_code=trust_remote_code)
File "/home/user/.local/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1050, in from_pretrained
config_class = CONFIG_MAPPING[config_dict["model_type"]]
File "/home/user/.local/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 748, in getitem
raise KeyError(key)
KeyError: 'RefinedWeb'

[text-generation-webui]$ python server.py --model models/flozi00_OpenAssistant-falcon-40B-4-bits-autogptq --listen --verbose --api --xformers --loader gptq-for-llama --model_type OPT
[2023-10-06 20:50:21,082] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2023-10-06 20:50:21.487741: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-10-06 20:50:22.148085: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2023-10-06 20:50:23 INFO:Loading settings from settings.json...
2023-10-06 20:50:23 INFO:Loading flozi00_OpenAssistant-falcon-40B-4-bits-autogptq...
2023-10-06 20:50:23 INFO:Found the following quantized model: models/flozi00_OpenAssistant-falcon-40B-4-bits-autogptq/model.safetensors
Traceback (most recent call last):
File "/home/user/text-generation-webui/server.py", line 222, in
shared.model, shared.tokenizer = load_model(model_name)
File "/home/user/text-generation-webui/modules/models.py", line 79, in load_model
output = load_func_maploader
File "/home/user/text-generation-webui/modules/models.py", line 312, in GPTQ_loader
model = modules.GPTQ_loader.load_quantized(model_name)
File "/home/user/text-generation-webui/modules/GPTQ_loader.py", line 144, in load_quantized
model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, kernel_switch_threshold=threshold)
File "/home/user/text-generation-webui/modules/GPTQ_loader.py", line 26, in _load_quant
config = AutoConfig.from_pretrained(model, trust_remote_code=shared.args.trust_remote_code)
File "/home/user/.local/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1050, in from_pretrained
config_class = CONFIG_MAPPING[config_dict["model_type"]]
File "/home/user/.local/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 748, in getitem
raise KeyError(key)
KeyError: 'RefinedWeb'

Sign up or log in to comment