metadata

language: en
license: mit
tags:
  - image-to-json
  - fine-tuning
datasets:
  - naver-clova-ix/cord-v2

Fine-Tuned LLAVA Model

This repository hosts the fine-tuned LLAVA model files, which have been adapted for data parsing and extracting JSON information from image reciepts. The model was fine-tuned on cord-v2 dataset.

Model Details

Model Versions

LLAVA 1.6 Mistral 7B
Fine-tuned version on Cord-V2 datasets.

How to Use

You can load and use this model directly from the HuggingFace Hub with the transformers library. Below is an example of how to load the model:

from transformers import AutoProcessor, BitsAndBytesConfig, LlavaNextForConditionalGeneration

MODEL_ID = "llava-hf/llava-v1.6-mistral-7b-hf"
REPO_ID = "Farzad-R/llava-v1.6-mistral-7b-cordv2"

processor = AutoProcessor.from_pretrained(MODEL_ID)

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.float16
)
model = LlavaNextForConditionalGeneration.from_pretrained(
    REPO_ID,
    torch_dtype=torch.float16,
    quantization_config=quantization_config,
)

image = Image.open(io.BytesIO(image_bytes))

# Prepare input
prompt = f"[INST] <image>\nExtract JSON [/INST]"
max_output_token = 256
inputs = processor(prompt, image, return_tensors="pt").to("cuda:0")
output = model.generate(**inputs, max_new_tokens=max_output_token)
response = processor.decode(output[0], skip_special_tokens=True)

# Convert response to JSON
generated_json = token2json(response)

To see the fine-tuning process and training configurtaton please visit this GitHub repository.

Additional Resources

Link to Hyperstack Cloud
GitHub Repository for Fine-Tuning LLAVA
A link to a YouTube video will be added here soon to provide further insights and demonstrations.