Model Card for LlavaNext BLP4k

This is a LlavaNext model finetuned on a synthetic dataset of bar, line, and pie charts. The goal is to detect if there is a misleading element in a chart image. The types of misleading elements that we propose are limited to: non-zero baseline for bar charts, omission of x-axis data points for line charts, and segments do not sum up to 100% in pie charts.

Model Details

Model Description

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

Developed by: Team Snoopy
Model type: Multimodal Image + Text
Finetuned from model: LlavaNext

Model Sources

Repository: LlavaNext blp-4k(adapters only)
Demo: LlavaNext BLP4k

Uses

Direct Use

[More Information Needed]

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Use the code below to get started with the model. Only works on GPU

# Load model
from transformers import (
    AutoProcessor,
    LlavaNextForConditionalGeneration,
    BitsAndBytesConfig
)
from peft import PeftConfig, PeftModel
import requests
import torch

base_model = "llava-hf/llava-v1.6-mistral-7b-hf"
adapter_weights_repo = "chart-misinformation-detection/hf-llava-next-finetune-blp4k"

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.float16
)

processor = AutoProcessor.from_pretrained(base_model)
model = LlavaNextForConditionalGeneration.from_pretrained(
    base_model,
    torch_dtype=torch.float16,
    quantization_config=quantization_config,
)

model = PeftModel.from_pretrained(model, adapter_weights_repo)

# preprocess input
prompt="[INST] <image>Evaluate if this chart is misleading, and if so explain [/INST]"
image = Image.open(requests.get(image_url, stream=True).raw)
inputs = processor(prompt, image, return_tensors="pt")

# inference
output = model.generate(**inputs, max_new_tokens=500)
print(processor.decode(output[0], skip_special_tokens=False))

Training Details

Training Data

BLP4k dataset(dataset of synthetically created bar, line, and pie charts including misleading and non-misleading ones)

Training Procedure

Training Hyperparameters

Training regime: [More Information Needed]

Citation

BibTeX:

Liu, Haotian, Li, Chunyuan, Li, Yuheng, Li, Bo, Zhang, Yuanhan, Shen, Sheng, & Lee, Yong Jae. (2024, January). LLaVA-NeXT: Improved reasoning, OCR, and world knowledge. Retrieved from https://llava-vl.github.io/blog/2024-01-30-llava-next/.
Liu, Haotian, Li, Chunyuan, Li, Yuheng, & Lee, Yong Jae. (2023). Improved Baselines with Visual Instruction Tuning. arXiv:2310.03744.
Liu, Haotian, Li, Chunyuan, Wu, Qingyang, & Lee, Yong Jae. (2023). Visual Instruction Tuning. NeurIPS.

chart-misinformation-detection
/

hf-llava-next-finetune-blp4k