[Bug] Does not run

#2
by catid - opened

Created a conda environment with latest transformers and pytorch and einops.

Getting this error from the provided example script:

(supercharger) ➜ supercharger git:(main) βœ— python test_falcon.py
Traceback (most recent call last):
File "/home/catid/sources/supercharger/test_falcon.py", line 8, in
pipeline = transformers.pipeline(
File "/home/catid/mambaforge/envs/supercharger/lib/python3.10/site-packages/transformers/pipelines/init.py", line 788, in pipeline
framework, model = infer_framework_load_model(
File "/home/catid/mambaforge/envs/supercharger/lib/python3.10/site-packages/transformers/pipelines/base.py", line 278, in infer_framework_load_model
raise ValueError(f"Could not load model {model} with any of the following classes: {class_tuple}.")
ValueError: Could not load model tiiuae/falcon-40b-instruct with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>,).

Yeah, same problem here. It doesn't even use the AutoModelForCausalLM. When I try to use it like: model = AutoModelForCausalLM.from_pretrained("tiiuae/falcon-40b-instruct") it still throws an error because the trust_remote_code parameter isn't set to True. I don't know how to fix it tho.

@Crenox

This worked for me

model = AutoModelForCausalLM.from_pretrained(
"tiiuae/falcon-40b-instruct", trust_remote_code=True
)

While generating text with falcon 40b instruct. Getting below error:
AttributeError: module 'torch.nn.functional' has no attribute 'scaled_dot_product_attention'

you need to upgrade to torch 2.0. That fixed the attribute error for me

After trying @supdeva 's fix to create the model I have this code:

model_name = "tiiuae/falcon-40b-instruct"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)

The tokenizer is created successfully, but the model creation yields the error:

The model 'RWForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'CodeGenForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MvpForCausalLM', 'OpenLlamaForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCausalLM', 'PLBartForCausalLM', 'ProphetNetForCausalLM', 'QDQBertLMHeadModel', 'ReformerModelWithLMHead', 'RemBertForCausalLM', 'RobertaForCausalLM', 'RobertaPreLayerNormForCausalLM', 'RoCBertForCausalLM', 'RoFormerForCausalLM', 'RwkvForCausalLM', 'Speech2Text2ForCausalLM', 'TransfoXLLMHeadModel', 'TrOCRForCausalLM', 'XGLMForCausalLM', 'XLMWithLMHeadModel', 'XLMProphetNetForCausalLM', 'XLMRobertaForCausalLM', 'XLMRobertaXLForCausalLM', 'XLNetLMHeadModel', 'XmodForCausalLM'].

What am I doing wrong?

That's just a warning, it should run fine.

Or the other error should be written below it

Right you are @eastwind , thank you!
It runs past that point indeed and there is a different error, which I managed to eliminate.
Thanks!

What was the other error out of curiosity?

Now it seems to run and just for anyone reading this, here is the code with the slight changes needed to get it running:

from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model_name = "tiiuae/falcon-40b-instruct"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", trust_remote_code=True)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
)
sequences = pipeline(
   "Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.\nDaniel: Hello, Girafatron!\nGirafatron:",
    max_length=200,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

But I have never run such a huge model and I am lost what kind of hardware it would need to actually work in a useful way.
I have it running on a machine with 8 x A100 80GB GPUs.
It runs for at least 10 (quite costly) minutes now and does not seem to produce any output yet.

So what kind of hardware does this monster need?

What was the other error out of curiosity?

CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle)

So, running out of GPU memory.

Ok, after all the work and some 20$ to Runpod, I got the glorious answer to the provided example prompt:

Result: Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared tothe glorious majesty of the giraffe.
Daniel: Hello, Girafatron!
Girafatron:: Divineinity215183SegSeg Hansonsignal HolmesOSS Seg Seg Rydergate Cowtown OgSegurities DennSys548AdvisorAdvisor Wachwachmeter603campus Ley Wie Ger Hendersonpositionpositionnement Seg Kitt Kitt Kitt FranklintownICTcorp Cetroniccorp Hoy Museobjet Dans DansMLSIngredientsProductionsCadCentre coinc Knight lust Sie Wer865bottom Cet Zimmer Nolandivision Wie427 unoGate Wars positivism Saunders esp sans uno Court Sie Barnettfields981pagesviews esp Danncampus esp sans Francisco Francisco Mesa tres tres Holmes dit Wol esp esp sans el dit Weather pour el poss MullerSys577 Denncampusposition Wer258Cad Denn respons responsabilidad Zum complet Dannforth Dixon Andrewsport891housing Baumgartenoperator Wie427world tout

Great to know...

I am trying to get tiiuae/falcon-40b-instruct working locally on a single A100 80GB GPU. Using captain-fin's code above I got it to go further. Now I am seeing the following error...

(pytorch-env) administrator@ubuntu:/falcon-40b-instruct$ python3 captain-fim.py
Traceback (most recent call last):
File "/home/administrator/falcon-40b-instruct/captain-fim.py", line 8, in
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", trust_remote_code=True)
File "/home/administrator/miniconda3/envs/pytorch-env/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 462, in from_pretrained
return model_class.from_pretrained(
File "/home/administrator/miniconda3/envs/pytorch-env/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2777, in from_pretrained
) = cls._load_pretrained_model(
File "/home/administrator/miniconda3/envs/pytorch-env/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2871, in _load_pretrained_model
raise ValueError(
ValueError: The current device_map had weights offloaded to the disk. Please provide an offload_folder for them. Alternatively, make sure you have safetensors installed if the model you are using offers the weights in this format.
(pytorch-env) administrator@ubuntu:
/falcon-40b-instruct$

I am assuming this is an issue finding the model weights. I have a copy in the same folder as the code above and I put another copy in a folder tiiuae/falcon-40b-instruct.

I will appreciate any advice.

@CloudCIX in the article How πŸ€— Accelerate runs very large models thanks to PyTorch i found this piece of information.
I guess it is what you need here.

If the device map computed automatically requires some weights to be offloaded on disk because you don't have enough GPU and CPU RAM, you will get an error indicating you need to pass an folder where the weights that should be stored on disk will be offloaded:

ValueError: The current `device_map` had weights offloaded to the disk. Please provide an 
`offload_folder` for them.

Adding this argument should resolve the error:

import torch
from transformers import AutoModelForCausalLM

# Will go out of RAM on Colab
checkpoint = "facebook/opt-13b"
model = AutoModelForCausalLM.from_pretrained(
    checkpoint, device_map="auto", offload_folder="offload", torch_dtype=torch.float16
)

Anyone have code to cache this from local directory? I tried this and it does not work:

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch
import os

def cache_model(model_name, cache_dir="./"):
    model_dir = os.path.join(cache_dir, model_name.replace("/", "_"))
    if not os.path.exists(model_dir):
        os.makedirs(model_dir)
        model = AutoModelForCausalLM.from_pretrained(model_name, cache_dir=model_dir, trust_remote_code=True)
        model.save_pretrained(model_dir)
    return model_dir

def download_model(model_name, cache_dir="./"):
    model_dir = cache_model(model_name, cache_dir)
    tokenizer = AutoTokenizer.from_pretrained(model_dir, cache_dir=cache_dir)
    model = AutoModelForCausalLM.from_pretrained(model_dir, cache_dir=cache_dir, trust_remote_code=True)
    return model, tokenizer

def generate_text(model, tokenizer, prompt, **kwargs):
    text_generation_pipeline = pipeline(
        "text-generation",
        model=model,
        tokenizer=tokenizer,
        torch_dtype=torch.bfloat16,
        trust_remote_code=True,
        device_map="auto",
    )
    sequences = text_generation_pipeline(prompt, **kwargs)
    return [seq['generated_text'] for seq in sequences]

When I go to load the model I get OSError: ./tiiuae_falcon-40b-instruct does not appear to have a file named config.json. Checkout 'https://huggingface.co/./tiiuae_falcon-40b-instruct/None' for available files.

You can clone the model directory into your own folder like this, make sure your in your desired directory first. 'path/to/model'

git lfs install
git clone https://huggingface.co/tiiuae/falcon-40b-instruct

Then you can load like

model = AutoModelForCausalLM.from_pretrained("path/to/model", trust_remote_code=True)

When loading a local model, why should WE set trust_remote_code to True. What if i need an offline execution?

Sign up or log in to comment