May 26, 2023

Used example script with latest pytorch, einops, and transformers but it does not work:

Traceback (most recent call last):
File "/home/catid/sources/supercharger/test_falcon_basic.py", line 8, in
pipeline = transformers.pipeline(
File "/home/catid/mambaforge/envs/supercharger/lib/python3.10/site-packages/transformers/pipelines/init.py", line 788, in pipeline
framework, model = infer_framework_load_model(
File "/home/catid/mambaforge/envs/supercharger/lib/python3.10/site-packages/transformers/pipelines/base.py", line 278, in infer_framework_load_model
raise ValueError(f"Could not load model {model} with any of the following classes: {class_tuple}.")
ValueError: Could not load model tiiuae/falcon-40b with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>,).

maccam912

May 26, 2023

•

edited May 26, 2023

Possibly related, I get The model 'RWForCausalLM' is not supported for text-generation

I do see that this warning pops up on 7b, which goes on to work fine, so might be a misleading warning here, just thought I'd share it.

leonlahoud

May 26, 2023

the Model doesn't work. I get the same error on 40B

ValueError: Could not load model tiiuae/falcon-40b-instruct with any of the following classes: (<class
'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>, <class
'transformers.models.auto.modeling_tf_auto.TFAutoModelForCausalLM'>).

catid

May 26, 2023

Oh good thought I was just doing something dumb

Ichigo2899

May 26, 2023

•

edited May 26, 2023

I am able to run the model on my end but the answer just keeps going and does not end. Also pretty slow in streaming response. Running on 96gb 4 A10G's.

model = AutoModelForCausalLM.from_pretrained(mname,trust_remote_code=True, torch_dtype=torch.bfloat16, device_map='auto')

loading like this and im getting error after one answer:
RuntimeError: The size of tensor a (9) must match the size of tensor b (488) at non-singleton dimension 1

Themis

May 27, 2023

Have we solved the problem?

vs4vijay

May 27, 2023

Facing the same issue, how do I solve?

ValueError: Could not load model tiiuae/falcon-7b with any of the following classes: (<class
'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>,).

JBHF

May 27, 2023

Whe using this code:

falcon-40b

from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model = "tiiuae/falcon-40b" # Gebruik evt het kleinere broertje: tiiuae/falcon-7b

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto",
)
sequences = pipeline(
"Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.\nDaniel: Hello, Girafatron!\nGirafatron:",
max_length=200,
do_sample=True,
top_k=10,
num_return_sequences=1,
eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
print(f"Result: {seq['generated_text']}")

I get this output:

Downloading (…)okenizer_config.json: 100%
175/175 [00:00<00:00, 6.61kB/s]
Downloading (…)/main/tokenizer.json: 100%
2.73M/2.73M [00:00<00:00, 5.61MB/s]
Downloading (…)cial_tokens_map.json: 100%
281/281 [00:00<00:00, 1.34kB/s]
Downloading (…)lve/main/config.json: 100%
656/656 [00:00<00:00, 947B/s]
Downloading (…)/configuration_RW.py: 100%
2.51k/2.51k [00:00<00:00, 3.46kB/s]
A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-40b:

configuration_RW.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
Downloading (…)main/modelling_RW.py: 100%
47.1k/47.1k [00:00<00:00, 108kB/s]
A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-40b:
modelling_RW.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
Downloading (…)model.bin.index.json: 100%
39.3k/39.3k [00:00<00:00, 697kB/s]
Downloading shards: 67%
6/9 [05:54<02:52, 57.54s/it]
Downloading (…)l-00001-of-00009.bin: 100%
9.50G/9.50G [00:46<00:00, 258MB/s]
Downloading (…)l-00002-of-00009.bin: 100%
9.51G/9.51G [01:14<00:00, 257MB/s]
Downloading (…)l-00003-of-00009.bin: 100%
9.51G/9.51G [00:50<00:00, 262MB/s]
Downloading (…)l-00004-of-00009.bin: 100%
9.51G/9.51G [00:55<00:00, 246MB/s]
Downloading (…)l-00005-of-00009.bin: 100%
9.51G/9.51G [00:57<00:00, 224MB/s]
Downloading (…)l-00006-of-00009.bin: 100%
9.51G/9.51G [00:58<00:00, 170MB/s]
Downloading (…)l-00007-of-00009.bin: 18%
1.74G/9.51G [00:12<00:44, 174MB/s]
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in <cell line: 13>:13 │
│ │

│ /usr/local/lib/python3.10/dist-packages/transformers/pipelines/init.py:788 in pipeline │
│ │
│ 785 │ # Forced if framework already defined, inferred if it's None │
│ 786 │ # Will load the correct model if possible │
│ 787 │ model_classes = {"tf": targeted_task["tf"], "pt": targeted_task["pt"]} │
│ ❱ 788 │ framework, model = infer_framework_load_model( │
│ 789 │ │ model, │
│ 790 │ │ model_classes=model_classes, │
│ 791 │ │ config=config, │
│ │
│ /usr/local/lib/python3.10/dist-packages/transformers/pipelines/base.py:279 in │
│ infer_framework_load_model │
│ │
│ 276 │ │ │ │ continue │
│ 277 │ │ │
│ 278 │ │ if isinstance(model, str): │
│ ❱ 279 │ │ │ raise ValueError(f"Could not load model {model} with any of the following cl │
│ 280 │ │
│ 281 │ framework = "tf" if "keras.engine.training.Model" in str(inspect.getmro(model.__clas │
│ 282 │ return framework, model │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: Could not load model tiiuae/falcon-40b with any of the following classes: (<class
'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>, <class
'transformers.models.auto.modeling_tf_auto.TFAutoModelForCausalLM'>).

59 hidden messages

Expand all

lucky

Jun 14, 2023

pip install transformers
pip install einops
pip install accelerate
pip install xformers

if you pip this package , it maybe ok , the problem of " ValueError: Could not load model tiiuae/falcon-7b-instruct with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>, <class
'transformers.models.auto.modeling_tf_auto.TFAutoModelForCausalLM'>" maybe solved

tommykoctur

Jun 14, 2023

I am loading model to A6000 GPU with 48GB ram. with torch.int8 . I am getting the same error.:
ValueError: Could not load model tiiuae/falcon-40b-instruct with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>,).

phdykd

Jun 14, 2023

Kernel Restarting
The kernel for Desktop/LLM/Falcon/Fl.ipynb appears to have died. It will restart automatically.
it does not work on M2 Apple MacBook.

from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model = "tiiuae/falcon-40b"

model = AutoModelForCausalLM.from_pretrained("tiiuae/falcon-40b", trust_remote_code=True)

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
# torch_dtype=torch.bfloat16,
trust_remote_code=True,
# device_map="auto",
)
sequences = pipeline(
"Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.\nDaniel: Hello, Girafatron!\nGirafatron:",
max_length=200,
do_sample=True,
top_k=10,
num_return_sequences=1,
eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
print(f"Result: {seq['generated_text']}")

patti-j

Jun 18, 2023

I was able to get past the AutoModelForCausalLM error in falcon-7b-instruct by using the line @alexwall77 provided below:

model = AutoModelForCausalLM.from_pretrained("tiiuae/falcon-40b", trust_remote_code=True)

Thank you, Alex!

from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model = AutoModelForCausalLM.from_pretrained("tiiuae/falcon-40b", trust_remote_code=True)



pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
)
sequences = pipeline(
   "Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.\nDaniel: Hello, Girafatron!\nGirafatron:",
    max_length=200,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

Andcircle

Jun 19, 2023

Whe using this code:

https://huggingface.co/tiiuae/falcon-40b

from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model = "tiiuae/falcon-40b" # Gebruik evt het kleinere broertje: tiiuae/falcon-7b

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto",
)
sequences = pipeline(
"Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.\nDaniel: Hello, Girafatron!\nGirafatron:",
max_length=200,
do_sample=True,
top_k=10,
num_return_sequences=1,
eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
print(f"Result: {seq['generated_text']}")

I get this output:

Downloading (…)okenizer_config.json: 100%
175/175 [00:00<00:00, 6.61kB/s]
Downloading (…)/main/tokenizer.json: 100%
2.73M/2.73M [00:00<00:00, 5.61MB/s]
Downloading (…)cial_tokens_map.json: 100%
281/281 [00:00<00:00, 1.34kB/s]
Downloading (…)lve/main/config.json: 100%
656/656 [00:00<00:00, 947B/s]
Downloading (…)/configuration_RW.py: 100%
2.51k/2.51k [00:00<00:00, 3.46kB/s]
A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-40b:

configuration_RW.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
Downloading (…)main/modelling_RW.py: 100%
47.1k/47.1k [00:00<00:00, 108kB/s]
A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-40b:

modelling_RW.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
Downloading (…)model.bin.index.json: 100%
39.3k/39.3k [00:00<00:00, 697kB/s]
Downloading shards: 67%
6/9 [05:54<02:52, 57.54s/it]
Downloading (…)l-00001-of-00009.bin: 100%
9.50G/9.50G [00:46<00:00, 258MB/s]
Downloading (…)l-00002-of-00009.bin: 100%
9.51G/9.51G [01:14<00:00, 257MB/s]
Downloading (…)l-00003-of-00009.bin: 100%
9.51G/9.51G [00:50<00:00, 262MB/s]
Downloading (…)l-00004-of-00009.bin: 100%
9.51G/9.51G [00:55<00:00, 246MB/s]
Downloading (…)l-00005-of-00009.bin: 100%
9.51G/9.51G [00:57<00:00, 224MB/s]
Downloading (…)l-00006-of-00009.bin: 100%
9.51G/9.51G [00:58<00:00, 170MB/s]
Downloading (…)l-00007-of-00009.bin: 18%
1.74G/9.51G [00:12<00:44, 174MB/s]
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in <cell line: 13>:13 │
│ │

│ /usr/local/lib/python3.10/dist-packages/transformers/pipelines/init.py:788 in pipeline │
│ │
│ 785 │ # Forced if framework already defined, inferred if it's None │
│ 786 │ # Will load the correct model if possible │
│ 787 │ model_classes = {"tf": targeted_task["tf"], "pt": targeted_task["pt"]} │
│ ❱ 788 │ framework, model = infer_framework_load_model( │
│ 789 │ │ model, │
│ 790 │ │ model_classes=model_classes, │
│ 791 │ │ config=config, │
│ │
│ /usr/local/lib/python3.10/dist-packages/transformers/pipelines/base.py:279 in │
│ infer_framework_load_model │
│ │
│ 276 │ │ │ │ continue │
│ 277 │ │ │
│ 278 │ │ if isinstance(model, str): │
│ ❱ 279 │ │ │ raise ValueError(f"Could not load model {model} with any of the following cl │
│ 280 │ │
│ 281 │ framework = "tf" if "keras.engine.training.Model" in str(inspect.getmro(model.__clas │
│ 282 │ return framework, model │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: Could not load model tiiuae/falcon-40b with any of the following classes: (<class
'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>, <class
'transformers.models.auto.modeling_tf_auto.TFAutoModelForCausalLM'>).

I got the same bug in google colab. Switched to using GPU and then it worked fine.

I am running this on Google COLAB (free version).
When I switch to GPU (GPU T4 Runtime in COLAB) I still get this error.
Also I tried switching to TPU on COLAB (which is possible because of the use of the accelerate lib !), I still get the same error.

How did you solve the problem? I run it on an EC2 with 8 A100, also got the same problem.

sumedh7-11

Jul 2, 2023

I am still getting the same error. Unable to load Falcon 40b instruct or Falcon 40b. This is the error
ValueError: Could not load model tiiuae/falcon-40b with any of the following classes: (<class
'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>,).

Also, I have enough space in RAM. It could be an issue with text generation. Any help on this.

RobeFranWBA

Jul 19, 2023

•

edited Jul 19, 2023

I found the solution to this.

I had to create a folder to offload the existing weights to to get it to work though which i named "device_map_weights".

import transformers
import torch

model = "tiiuae/falcon-40b-instruct"

tokenizer = AutoTokenizer.from_pretrained(
    model,
    device_map="auto",
    trust_remote_code=True,
    offload_folder="device_map_weights"
    )
model = AutoModelForCausalLM.from_pretrained(
    model,
    device_map="auto",
    trust_remote_code=True,
    offload_folder="device_map_weights"
    )

pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
)
sequences = pipeline(
   "Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.\nDaniel: Hello, Girafatron!\nGirafatron:",
    max_length=200,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

surya-narayanan

Aug 23, 2023

have we solved the original issue?

tiiuae
/

falcon-40b

[Bug] Does not work

https://huggingface.co/tiiuae/falcon-40b

model = "tiiuae/falcon-40b"

https://huggingface.co/tiiuae/falcon-40b