library_name: transformers
language:
- en
datasets:
- chart-misinformation-detection/bar_line_pie_4k
pipeline_tag: image-text-to-text
tags:
- chart
Model Card for LlavaNext BLP4k
This is a LlavaNext model finetuned on a synthetic dataset of bar, line, and pie charts. The goal is to detect if there is a misleading element in a chart image. The types of misleading elements that we propose are limited to: non-zero baseline for bar charts, omission of x-axis data points for line charts, and segments do not sum up to 100% in pie charts.
Model Details
Model Description
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
- Developed by: Team Snoopy
- Model type: Multimodal Image + Text
- Finetuned from model: LlavaNext
Model Sources
- Repository: LlavaNext blp-4k(adapters only)
- Demo: LlavaNext BLP4k
Uses
Direct Use
[More Information Needed]
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
How to Get Started with the Model
Use the code below to get started with the model. Only works on GPU
# Load model
from transformers import (
AutoProcessor,
LlavaNextForConditionalGeneration,
BitsAndBytesConfig
)
from peft import PeftConfig, PeftModel
import requests
import torch
base_model = "llava-hf/llava-v1.6-mistral-7b-hf"
adapter_weights_repo = "chart-misinformation-detection/hf-llava-next-finetune-blp4k"
quantization_config = BitsAndBytesConfig(
load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.float16
)
processor = AutoProcessor.from_pretrained(base_model)
model = LlavaNextForConditionalGeneration.from_pretrained(
base_model,
torch_dtype=torch.float16,
quantization_config=quantization_config,
)
model = PeftModel.from_pretrained(model, adapter_weights_repo)
# preprocess input
prompt="[INST] <image>Evaluate if this chart is misleading, and if so explain [/INST]"
image = Image.open(requests.get(image_url, stream=True).raw)
inputs = processor(prompt, image, return_tensors="pt")
# inference
output = model.generate(**inputs, max_new_tokens=500)
print(processor.decode(output[0], skip_special_tokens=False))
Training Details
Training Data
BLP4k dataset(dataset of synthetically created bar, line, and pie charts including misleading and non-misleading ones)
Training Procedure
Training Hyperparameters
- Training regime: [More Information Needed]
Citation
BibTeX:
Liu, Haotian, Li, Chunyuan, Li, Yuheng, Li, Bo, Zhang, Yuanhan, Shen, Sheng, & Lee, Yong Jae. (2024, January). LLaVA-NeXT: Improved reasoning, OCR, and world knowledge. Retrieved from https://llava-vl.github.io/blog/2024-01-30-llava-next/.
Liu, Haotian, Li, Chunyuan, Li, Yuheng, & Lee, Yong Jae. (2023). Improved Baselines with Visual Instruction Tuning. arXiv:2310.03744.
Liu, Haotian, Li, Chunyuan, Wu, Qingyang, & Lee, Yong Jae. (2023). Visual Instruction Tuning. NeurIPS.