When query model for text generation I get this - The model 'RWForCausalLM' is not supported for text-generation.

#8
by airtable - opened

I am using langchain to load falcon-40b on an H100 GPU machine but I get this and nothing is generated when I pass a context to it using FAISS

The model 'RWForCausalLM' is not supported for text-generation. Supported models are
 ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 

'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'CodeGenForCausalLM', 'CpmAntForCausalLM', 

'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 

'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MvpForCausalLM', 'OpenLlamaForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCausalLM', 'PLBartForCausalLM', 'ProphetNetForCausalLM', 'QDQBertLMHeadModel', 

'ReformerModelWithLMHead', 'RemBertForCausalLM', 'RobertaForCausalLM', 'RobertaPreLayerNormForCausalLM', 'RoCBertForCausalLM', 
'RoFormerForCausalLM', 'RwkvForCausalLM', 'Speech2Text2ForCausalLM', 'TransfoXLLMHeadModel', 'TrOCRForCausalLM', 'XGLMForCausalLM', 

'XLMWithLMHeadModel', 'XLMProphetNetForCausalLM', 'XLMRobertaForCausalLM', 'XLMRobertaXLForCausalLM', 'XLNetLMHeadModel', 'XmodForCausalLM'].

This is how I am loading the model and providing FAISS embeddings to it

def load_embeddings(sotre_name, path):
    with open(f"{path}/faiss_{sotre_name}.pkl", "rb") as f:
        VectorStore = pickle.load(f)
    return VectorStore

Embedding_store_path = f"./dbfs"


# hf_embed = load_embeddings(sotre_name='huggingface_fm_lambdalabs_faiss', 
hf_embed = load_embeddings(sotre_name='store_template', 
                                    path=Embedding_store_path)

def get_similar_docs(question, similar_doc_count):
  return hf_embed.similarity_search(question, k=similar_doc_count)

def build_qa_chain():
  torch.cuda.empty_cache()
  model_name = "tiiuae/falcon-40b"
 
  tokenizer = AutoTokenizer.from_pretrained(model_name)
  instruct_pipeline = pipeline(model=model_name, tokenizer=tokenizer, torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto", 
                               return_full_text=True, max_new_tokens=256, top_p=0.95, top_k=50)
 
  # Defining our prompt content.
  # langchain will load our similar documents as {context}
  template = """Below is an instruction that describes a task. Write a response that appropriately completes the request.
 
  Instruction: 
  You are an experienced in .. and your job is to help providing the best answer related to .... 
  Use only information in the following paragraphs to answer the question at the end. Explain the answer with reference to these paragraphs. If you don't know, say that you do not know.
 
  {context}
 
  Question: {question}
 
  Response:
  """
  prompt = PromptTemplate(input_variables=['context', 'question'], template=template)
 
  hf_pipe = HuggingFacePipeline(pipeline=instruct_pipeline)
  # Set verbose=True to see the full prompt:
  return load_qa_chain(llm=hf_pipe, chain_type="stuff", prompt=prompt, verbose=True)

qa_chain = build_qa_chain()

def answer_question(question):
  similar_docs = get_similar_docs(question, similar_doc_count=1)
  result = qa_chain({"input_documents": similar_docs, "question": question})
  
  print("question: " + question)
  print(" ")
  print("Answer: ")
  print(result['output_text'])
  print(" ")  
  print("Sources")
  print(" ")
  for d in result["input_documents"]:
    source_id = d.metadata["source"]
    print(d.page_content)
    print("Source " + source_id)
    print(" ")
    
answer_question("<question>?")
while True:
    query = input("\nEnter a query: ")
    if query == "exit":
        break

    # Get the answer from the chain
    answer_question(query)
This comment has been hidden

I get the same error only by running the "How to Get Started with the Model"

+1 I also get this error

This comment has been hidden

@Seledorn :)

and this:
File ~/.cache/huggingface/modules/transformers_modules/falcon40b/modelling_RW.py:32, in Linear.forward(self, input)
31 def forward(self, input: torch.Tensor) -> torch.Tensor:
---> 32 ret = input @ self.weight.T
33 if self.bias is None:
34 return ret

RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle)

This comment has been hidden

same problem here!

same here with the how to get started
model: tiiuae/falcon-7b-instruct

Technology Innovation Institute org

Sorry about the delay, the The model 'RWForCausalLM' is not supported for text-generation comes from the model not being integrated into the core part of the transformers library yet. It's just a warning, and generation should follow afterwards. See for example: https://twitter.com/camenduru/status/1662225039352283137?s=20 of a video where it is working correctly.

It will take a little bit of time to integrate the model fully into the transformers library, but hopefully in a couple of weeks this warning will go away.

FalconLLM changed discussion status to closed

@FalconLLM Thanks, Falcon-7B is generating data but I am unable to load Falcon-40B on a 1xNvidia H100 GPU with 80 VRAM, opening a separate issue

@FalconLLM any updates on this issue?

For me it was resolved with pip install git+https://github.com/huggingface/transformers

For me it was resolved with pip install git+https://github.com/huggingface/transformers

It worked for me as well. Thanks!!

same here. thanks

For me it was resolved with pip install git+https://github.com/huggingface/transformers

For me it was resolved with pip install git+https://github.com/huggingface/transformers

Same for me. It also speed up inference drastically for the 7b-instruct model. Thanks a lot!

Still getting this issue

Its not working for text generation. It says AttributeError: module transformers has no attribute RWForCausalLM

Sign up or log in to comment