speechbrain
PyTorch
English
speech-llm
audio-llm

Issues when run LTU-AS model

#2
by Yoohao - opened

Hi all,

I have installed the latest SpeechBrain development version, but when I tried to run the inference of LTU-AS, I encountered an issue. The first command (from speechbrain.inference.multimodal import LTU_AS) shows the error "No module named 'speechbrain.inference.multimodal'". Does the current SpeechBrain version support LTU-AS? Furthermore, I noticed that the link to the training information (https://github.com/speechbrain/speechbrain/tree/develop/recipes/OpenASQA/ltu-as) is also empty.

Looking forward your reply. :)

SpeechBrain org
edited Aug 7, 2024

Hi @Yoohao thank you for your interest. The PR https://github.com/speechbrain/speechbrain/pull/2550 is not yet merged into the dev branch, we plan to do it soon. If you want you can install this branch https://github.com/BenoitWang/speechbrain/tree/speech_llm and run the recipe for now.

Best,
Yingzhi

Hi yingzhi, thank you for your timely help. The former problem has been solved, while unfortunately, here comes another one.

When I run the cammond "ltu_as = LTU_AS.from_hparams(source="speechbrain/speech-llm-LTU-AS-openasqa")" to load the model, it shows there is a mismatch between the architecture and the parameters:
"RuntimeError: Error(s) in loading state_dict for LLAMA2:
size mismatch for model.base_model.model.model.layers.0.self_attn.q_proj.base_layer.weight: copying a param with shape torch.Size([4096, 4096]) from checkpoint, the shape in current model is torch.Size([8388608, 1])."
I found the model is generated by the speechbrain/lobes/models/huggingface_transformers/llama2.py, while the LLMs used for this project is llama3. I hope this feedback could make your project better.

SpeechBrain org

Hi @Yoohao , I didn't have this problem, could you try the huggingface version mentioned here https://github.com/BenoitWang/speechbrain/blob/speech_llm/recipes/OpenASQA/ltu-as/extra_requirements.txt to see if it works please? As for the LLAMA2 script, we use it for all the llama series models since there's no difference in the architecture.

Best,
Yingzhi

Hi yingzhi,

Thanks again for your warm help. The previous suggestion worked well, but I'm sorry to say that there is still an issue. When I run "processor = AutoProcessor.from_pretrained("openai/whisper-large-v3")", it shows the error, "Wrong index found for <|0.02|>: should be None but found 50366." It seems to be a problem with the time-stamp token in the vocabulary. Is there something I'm still missing?

SpeechBrain org

Hi @Yoohao I am using transformers==4.34.0 and it works fine. I guess maybe you were using a lower version. Thanks for reporting I will add this info to the model card.

The above problem is solved, but new problems have emerged:
model_id = "openai/whisper-large-v3"
processor = AutoProcessor.from_pretrained(model_id)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "Lib\site-packages\transformers\models\auto\processing_auto.py", line 287, in from_pretrained
return processor_class.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "Lib\site-packages\transformers\processing_utils.py", line 226, in from_pretrained
args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "Lib\site-packages\transformers\processing_utils.py", line 270, in _get_arguments_from_pretrained
args.append(attribute_class.from_pretrained(pretrained_model_name_or_path, **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "Lib\site-packages\transformers\tokenization_utils_base.py", line 1854, in from_pretrained
return cls._from_pretrained(
^^^^^^^^^^^^^^^^^^^^^
File "Lib\site-packages\transformers\tokenization_utils_base.py", line 2066, in _from_pretrained
raise ValueError(
ValueError: Wrong index found for <|0.02|>: should be None but found 50366.

SpeechBrain org

Hi @Simon13456 , I used the following combo without issues, could you try and see if it works?

torch==2.2.2
transformers==4.34.0
tokenizers==0.14.1

@yingzhi Thanks for your help. In addition to these, I upgraded huggingface-hub to solve the problem.

SpeechBrain org

Good to know! Thank you as well for testing!

Sign up or log in to comment