Farzad-R
/

llava-v1.6-mistral-7b-cordv2

Model card Files Files and versions Community

llava-v1.6-mistral-7b-cordv2 / README.md

Farzad-R

update readme add yaml

da127ba 7 months ago

|

history blame contribute delete

2.17 kB

	---
	language: en
	license: mit
	tags:
	- image-to-json
	- fine-tuning
	datasets:
	- naver-clova-ix/cord-v2
	---

	# Fine-Tuned LLAVA Model

	This repository hosts the fine-tuned LLAVA model files, which have been adapted for data parsing and extracting JSON information from image reciepts. The model was fine-tuned on [cord-v2](https://huggingface.co/datasets/naver-clova-ix/cord-v2) dataset.

	## Model Details

	### Model Versions
	- LLAVA 1.6 Mistral 7B
	Fine-tuned version on Cord-V2 datasets.

	## How to Use

	You can load and use this model directly from the HuggingFace Hub with the `transformers` library. Below is an example of how to load the model:

	```python
	from transformers import AutoProcessor, BitsAndBytesConfig, LlavaNextForConditionalGeneration

	MODEL_ID = "llava-hf/llava-v1.6-mistral-7b-hf"
	REPO_ID = "Farzad-R/llava-v1.6-mistral-7b-cordv2"

	processor = AutoProcessor.from_pretrained(MODEL_ID)

	quantization_config = BitsAndBytesConfig(
	load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.float16
	)
	model = LlavaNextForConditionalGeneration.from_pretrained(
	REPO_ID,
	torch_dtype=torch.float16,
	quantization_config=quantization_config,
	)

	image = Image.open(io.BytesIO(image_bytes))

	# Prepare input
	prompt = f"[INST] <image>\nExtract JSON [/INST]"
	max_output_token = 256
	inputs = processor(prompt, image, return_tensors="pt").to("cuda:0")
	output = model.generate(**inputs, max_new_tokens=max_output_token)
	response = processor.decode(output[0], skip_special_tokens=True)

	# Convert response to JSON
	generated_json = token2json(response)
	```
	---
	To see the fine-tuning process and training configurtaton please visit [this GitHub](https://github.com/Farzad-R/Finetune-LLAVA-NEXT) repository.
	---

	## Additional Resources

	- [Link to Hyperstack Cloud](https://www.hyperstack.cloud/?utm_source=Influencer&utm_medium=AI%20Round%20Table&utm_campaign=Video%201)
	- [GitHub Repository for Fine-Tuning LLAVA](https://github.com/Farzad-R/Finetune-LLAVA-NEXT)
	- A link to a YouTube video will be added here soon to provide further insights and demonstrations.