Bad responses compared to LLaVA-13b-delta-v0
I found that this model consistently gives bad responses compared to the LLaVA-13b-delta-v0 model.
LLaVA-13b-delta-v0 consistently provides an accurate response to this question:
LLaVA-13b-delta-v0-science_qa consistently provides inaccurate responses:
This is on 4bit 128g models.
The same applies to text recognition capabilities. Science_qa does get it right as well, just not as good.
Also, I must note that it's missing the added tokens. I resized it and it seems to work, but I'm not sure if it impacted the model otherwise. I get the more or less the same result on a model without the resize as well.
DEFAULT_IMAGE_TOKEN = "<image>"
DEFAULT_IMAGE_PATCH_TOKEN = "<im_patch>"
DEFAULT_IM_START_TOKEN = "<im_start>"
DEFAULT_IM_END_TOKEN = "<im_end>"
def resize(args):
# Model
disable_torch_init()
model_name = os.path.expanduser(args.model)
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
tokenizer.add_tokens([DEFAULT_IMAGE_PATCH_TOKEN, DEFAULT_IM_START_TOKEN, DEFAULT_IM_END_TOKEN], special_tokens=True)
print(len(tokenizer)) # returns 32000 regardless
model.resize_token_embeddings(len(tokenizer) + 3) # I just manually incremented, seems to work
model.save_pretrained(os.path.expanduser(args.target))
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--model", type=str, required=True)
parser.add_argument("--target", type=str, required=True)
args = parser.parse_args()
resize(args)
Excuse me, would you mind if I inquire about the library required to run "LLaVA-13b-delta-v0"? I attempted to install it using the commands:
pip install git+https://github.com/huggingface/transformers
pip install tokenizers --upgrade
However, I encountered a recursion error persistently. Could you please guide me on the appropriate library installation process? I sincerely appreciate your assistance. Thank you kindly.