Why does it show `language_model = meta-llama/Llama-2-7b-hf` and not Vicuna?

#13
by HenningBlue - opened

Hi,
When initialising llava-1.5-7b with default config, the model object shows that model.language_model = meta-llama/Llama-2-7b-hf. I would expect it to point to a Vicuna version. Is the attribute just displayed incorrectly? Or how does it load and utilise the Vicuna model?

Llava Hugging Face org

It does not use the vicuna model, I am not sure I follow why it should?

Llava Hugging Face org

I think what @HenningBlue is referring to is that Llava-1.5 is a checkpoint that started training from the Vicuna chekcpoint which is true. We can replace that config value to the vicuna 7b checkpoint but both models use the same underlying architecture - would you be happy to open a PR for that?

This comment has been hidden

Yes, I was referring to the fact that llava-1.5 starts from a Vicuna checkpoint and I was wondering whether Llama2 and Vicuna have the same implementation in the transformers library. Perhaps for clarity, it would be good the change the checkpoint name to Vicuna 7/13B.
But how would that affect the underlying models that are used for this LLaVA implementation? If it's currently using a plain Llama2 LM then the behaviour and performance with the finetuned Vicuna will be different ofc. However, I'm unsure about the implementation details of the LLaVA class in transformers.

Llava Hugging Face org

Yes the Vicuna uses the Llama arch if I am not mistaken.
The performance of the model, logits and outputs are the same as the original, we always make sure of that when porting a model to the library!

Llava Hugging Face org

Llava in transformers uses AutModel.from_config to define the vision mode and the language model. Thus the langage model uses the Llama architecture

Sign up or log in to comment