|
---
|
|
language: en
|
|
license: mit
|
|
tags:
|
|
- image-to-json
|
|
- fine-tuning
|
|
datasets:
|
|
- naver-clova-ix/cord-v2
|
|
---
|
|
|
|
# Fine-Tuned LLAVA Model
|
|
|
|
This repository hosts the fine-tuned LLAVA model files, which have been adapted for data parsing and extracting JSON information from image reciepts. The model was fine-tuned on [cord-v2](https://huggingface.co/datasets/naver-clova-ix/cord-v2) dataset.
|
|
|
|
## Model Details
|
|
|
|
### Model Versions
|
|
- **LLAVA 1.6 Mistral 7B**
|
|
Fine-tuned version on Cord-V2 datasets.
|
|
|
|
## How to Use
|
|
|
|
You can load and use this model directly from the HuggingFace Hub with the `transformers` library. Below is an example of how to load the model:
|
|
|
|
```python
|
|
from transformers import AutoProcessor, BitsAndBytesConfig, LlavaNextForConditionalGeneration
|
|
|
|
MODEL_ID = "llava-hf/llava-v1.6-mistral-7b-hf"
|
|
REPO_ID = "Farzad-R/llava-v1.6-mistral-7b-cordv2"
|
|
|
|
processor = AutoProcessor.from_pretrained(MODEL_ID)
|
|
|
|
quantization_config = BitsAndBytesConfig(
|
|
load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.float16
|
|
)
|
|
model = LlavaNextForConditionalGeneration.from_pretrained(
|
|
REPO_ID,
|
|
torch_dtype=torch.float16,
|
|
quantization_config=quantization_config,
|
|
)
|
|
|
|
image = Image.open(io.BytesIO(image_bytes))
|
|
|
|
# Prepare input
|
|
prompt = f"[INST] <image>\nExtract JSON [/INST]"
|
|
max_output_token = 256
|
|
inputs = processor(prompt, image, return_tensors="pt").to("cuda:0")
|
|
output = model.generate(**inputs, max_new_tokens=max_output_token)
|
|
response = processor.decode(output[0], skip_special_tokens=True)
|
|
|
|
# Convert response to JSON
|
|
generated_json = token2json(response)
|
|
```
|
|
---
|
|
To see the fine-tuning process and training configurtaton please visit [this GitHub](https://github.com/Farzad-R/Finetune-LLAVA-NEXT) repository.
|
|
---
|
|
|
|
## Additional Resources
|
|
|
|
- [Link to Hyperstack Cloud](https://www.hyperstack.cloud/?utm_source=Influencer&utm_medium=AI%20Round%20Table&utm_campaign=Video%201)
|
|
- [GitHub Repository for Fine-Tuning LLAVA](https://github.com/Farzad-R/Finetune-LLAVA-NEXT)
|
|
- A link to a YouTube video will be added here soon to provide further insights and demonstrations. |