tiiuae/falcon-40b-instruct · [Bug] Does not run

May 26, 2023

•

edited May 26, 2023

Created a conda environment with latest transformers and pytorch and einops.

Getting this error from the provided example script:

(supercharger) ➜ supercharger git:(main) ✗ python test_falcon.py
Traceback (most recent call last):
File "/home/catid/sources/supercharger/test_falcon.py", line 8, in
pipeline = transformers.pipeline(
File "/home/catid/mambaforge/envs/supercharger/lib/python3.10/site-packages/transformers/pipelines/init.py", line 788, in pipeline
framework, model = infer_framework_load_model(
File "/home/catid/mambaforge/envs/supercharger/lib/python3.10/site-packages/transformers/pipelines/base.py", line 278, in infer_framework_load_model
raise ValueError(f"Could not load model {model} with any of the following classes: {class_tuple}.")
ValueError: Could not load model tiiuae/falcon-40b-instruct with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>,).

Crenox

May 27, 2023

Yeah, same problem here. It doesn't even use the AutoModelForCausalLM. When I try to use it like: model = AutoModelForCausalLM.from_pretrained("tiiuae/falcon-40b-instruct") it still throws an error because the trust_remote_code parameter isn't set to True. I don't know how to fix it tho.

supdeva

May 28, 2023

@Crenox

This worked for me

model = AutoModelForCausalLM.from_pretrained(
"tiiuae/falcon-40b-instruct", trust_remote_code=True
)

singhalshikha518

Jun 1, 2023

While generating text with falcon 40b instruct. Getting below error:
AttributeError: module 'torch.nn.functional' has no attribute 'scaled_dot_product_attention'

eastwind

Jun 2, 2023

you need to upgrade to torch 2.0. That fixed the attribute error for me

captain-fim

Jun 3, 2023

After trying @supdeva 's fix to create the model I have this code:

model_name = "tiiuae/falcon-40b-instruct"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)

The tokenizer is created successfully, but the model creation yields the error:

The model 'RWForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'CodeGenForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MvpForCausalLM', 'OpenLlamaForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCausalLM', 'PLBartForCausalLM', 'ProphetNetForCausalLM', 'QDQBertLMHeadModel', 'ReformerModelWithLMHead', 'RemBertForCausalLM', 'RobertaForCausalLM', 'RobertaPreLayerNormForCausalLM', 'RoCBertForCausalLM', 'RoFormerForCausalLM', 'RwkvForCausalLM', 'Speech2Text2ForCausalLM', 'TransfoXLLMHeadModel', 'TrOCRForCausalLM', 'XGLMForCausalLM', 'XLMWithLMHeadModel', 'XLMProphetNetForCausalLM', 'XLMRobertaForCausalLM', 'XLMRobertaXLForCausalLM', 'XLNetLMHeadModel', 'XmodForCausalLM'].

What am I doing wrong?

eastwind

Jun 3, 2023

That's just a warning, it should run fine.

eastwind

Jun 3, 2023

Or the other error should be written below it

captain-fim

Jun 3, 2023

Right you are @eastwind , thank you!
It runs past that point indeed and there is a different error, which I managed to eliminate.
Thanks!

eastwind

Jun 3, 2023

•

edited Jun 3, 2023

What was the other error out of curiosity?

captain-fim

Jun 3, 2023

Now it seems to run and just for anyone reading this, here is the code with the slight changes needed to get it running:

from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model_name = "tiiuae/falcon-40b-instruct"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", trust_remote_code=True)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
)
sequences = pipeline(
   "Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.\nDaniel: Hello, Girafatron!\nGirafatron:",
    max_length=200,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

But I have never run such a huge model and I am lost what kind of hardware it would need to actually work in a useful way.
I have it running on a machine with 8 x A100 80GB GPUs.
It runs for at least 10 (quite costly) minutes now and does not seem to produce any output yet.

So what kind of hardware does this monster need?

captain-fim

Jun 3, 2023

What was the other error out of curiosity?

CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle)

So, running out of GPU memory.

captain-fim

Jun 3, 2023

Ok, after all the work and some 20$ to Runpod, I got the glorious answer to the provided example prompt:

Result: Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared tothe glorious majesty of the giraffe.
Daniel: Hello, Girafatron!
Girafatron:: Divineinity215183SegSeg Hansonsignal HolmesOSS Seg Seg Rydergate Cowtown OgSegurities DennSys548AdvisorAdvisor Wachwachmeter603campus Ley Wie Ger Hendersonpositionpositionnement Seg Kitt Kitt Kitt FranklintownICTcorp Cetroniccorp Hoy Museobjet Dans DansMLSIngredientsProductionsCadCentre coinc Knight lust Sie Wer865bottom Cet Zimmer Nolandivision Wie427 unoGate Wars positivism Saunders esp sans uno Court Sie Barnettfields981pagesviews esp Danncampus esp sans Francisco Francisco Mesa tres tres Holmes dit Wol esp esp sans el dit Weather pour el poss MullerSys577 Denncampusposition Wer258Cad Denn respons responsabilidad Zum complet Dannforth Dixon Andrewsport891housing Baumgartenoperator Wie427world tout

Great to know...

JerrySweeney

Jun 4, 2023

•

edited Jun 4, 2023

I am trying to get tiiuae/falcon-40b-instruct working locally on a single A100 80GB GPU. Using captain-fin's code above I got it to go further. Now I am seeing the following error...

(pytorch-env) administrator@ubuntu:/falcon-40b-instruct$ python3 captain-fim.py
Traceback (most recent call last):
File "/home/administrator/falcon-40b-instruct/captain-fim.py", line 8, in
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", trust_remote_code=True)
File "/home/administrator/miniconda3/envs/pytorch-env/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 462, in from_pretrained
return model_class.from_pretrained(
File "/home/administrator/miniconda3/envs/pytorch-env/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2777, in from_pretrained
) = cls._load_pretrained_model(
File "/home/administrator/miniconda3/envs/pytorch-env/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2871, in _load_pretrained_model
raise ValueError(
ValueError: The current device_map had weights offloaded to the disk. Please provide an offload_folder for them. Alternatively, make sure you have safetensors installed if the model you are using offers the weights in this format.
(pytorch-env) administrator@ubuntu:/falcon-40b-instruct$

I am assuming this is an issue finding the model weights. I have a copy in the same folder as the code above and I put another copy in a folder tiiuae/falcon-40b-instruct.

I will appreciate any advice.

captain-fim

Jun 4, 2023

@CloudCIX in the article How 🤗 Accelerate runs very large models thanks to PyTorch i found this piece of information.
I guess it is what you need here.

If the device map computed automatically requires some weights to be offloaded on disk because you don't have enough GPU and CPU RAM, you will get an error indicating you need to pass an folder where the weights that should be stored on disk will be offloaded:

ValueError: The current `device_map` had weights offloaded to the disk. Please provide an 
`offload_folder` for them.

Adding this argument should resolve the error:

import torch
from transformers import AutoModelForCausalLM

# Will go out of RAM on Colab
checkpoint = "facebook/opt-13b"
model = AutoModelForCausalLM.from_pretrained(
    checkpoint, device_map="auto", offload_folder="offload", torch_dtype=torch.float16
)

MarginallyEffective

Jul 12, 2023

Anyone have code to cache this from local directory? I tried this and it does not work:

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch
import os

def cache_model(model_name, cache_dir="./"):
    model_dir = os.path.join(cache_dir, model_name.replace("/", "_"))
    if not os.path.exists(model_dir):
        os.makedirs(model_dir)
        model = AutoModelForCausalLM.from_pretrained(model_name, cache_dir=model_dir, trust_remote_code=True)
        model.save_pretrained(model_dir)
    return model_dir

def download_model(model_name, cache_dir="./"):
    model_dir = cache_model(model_name, cache_dir)
    tokenizer = AutoTokenizer.from_pretrained(model_dir, cache_dir=cache_dir)
    model = AutoModelForCausalLM.from_pretrained(model_dir, cache_dir=cache_dir, trust_remote_code=True)
    return model, tokenizer

def generate_text(model, tokenizer, prompt, **kwargs):
    text_generation_pipeline = pipeline(
        "text-generation",
        model=model,
        tokenizer=tokenizer,
        torch_dtype=torch.bfloat16,
        trust_remote_code=True,
        device_map="auto",
    )
    sequences = text_generation_pipeline(prompt, **kwargs)
    return [seq['generated_text'] for seq in sequences]

When I go to load the model I get OSError: ./tiiuae_falcon-40b-instruct does not appear to have a file named config.json. Checkout 'https://huggingface.co/./tiiuae_falcon-40b-instruct/None' for available files.

eastwind

Jul 13, 2023

You can clone the model directory into your own folder like this, make sure your in your desired directory first. 'path/to/model'

git lfs install
git clone https://huggingface.co/tiiuae/falcon-40b-instruct

Then you can load like

model = AutoModelForCausalLM.from_pretrained("path/to/model", trust_remote_code=True)

cazz1

Sep 2, 2023

When loading a local model, why should WE set trust_remote_code to True. What if i need an offline execution?