--- language: en license: mit tags: - image-to-json - fine-tuning datasets: - naver-clova-ix/cord-v2 --- # Fine-Tuned LLAVA Model This repository hosts the fine-tuned LLAVA model files, which have been adapted for data parsing and extracting JSON information from image reciepts. The model was fine-tuned on [cord-v2](https://huggingface.co/datasets/naver-clova-ix/cord-v2) dataset. ## Model Details ### Model Versions - **LLAVA 1.6 Mistral 7B** Fine-tuned version on Cord-V2 datasets. ## How to Use You can load and use this model directly from the HuggingFace Hub with the `transformers` library. Below is an example of how to load the model: ```python from transformers import AutoProcessor, BitsAndBytesConfig, LlavaNextForConditionalGeneration MODEL_ID = "llava-hf/llava-v1.6-mistral-7b-hf" REPO_ID = "Farzad-R/llava-v1.6-mistral-7b-cordv2" processor = AutoProcessor.from_pretrained(MODEL_ID) quantization_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.float16 ) model = LlavaNextForConditionalGeneration.from_pretrained( REPO_ID, torch_dtype=torch.float16, quantization_config=quantization_config, ) image = Image.open(io.BytesIO(image_bytes)) # Prepare input prompt = f"[INST] \nExtract JSON [/INST]" max_output_token = 256 inputs = processor(prompt, image, return_tensors="pt").to("cuda:0") output = model.generate(**inputs, max_new_tokens=max_output_token) response = processor.decode(output[0], skip_special_tokens=True) # Convert response to JSON generated_json = token2json(response) ``` --- To see the fine-tuning process and training configurtaton please visit [this GitHub](https://github.com/Farzad-R/Finetune-LLAVA-NEXT) repository. --- ## Additional Resources - [Link to Hyperstack Cloud](https://www.hyperstack.cloud/?utm_source=Influencer&utm_medium=AI%20Round%20Table&utm_campaign=Video%201) - [GitHub Repository for Fine-Tuning LLAVA](https://github.com/Farzad-R/Finetune-LLAVA-NEXT) - A link to a YouTube video will be added here soon to provide further insights and demonstrations.