Kadins
/

Llama-3.2-Vision-chinese-lora

Image-Text-to-Text

Model card Files Files and versions Community

Llama-3.2-Vision-chinese-lora / README.md

Kadins's picture

Update README.md

430fa96 verified 17 days ago

|

history blame contribute delete

2.26 kB

	---
	license: apache-2.0
	language:
	- zh
	- en
	base_model:
	- meta-llama/Llama-3.2-11B-Vision-Instruct
	tags:
	- llama
	- lora
	- chinese
	- zh
	- mllama
	pipeline_tag: image-text-to-text
	library_name: peft
	---

	# Llama-3.2-Vision-chinese-lora
	- base model: [meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct)
	## Features
	- Utilize a large amount of high-quality Chinese text and VQA data to significantly enhance the model's Chinese OCR capabilities.
	## Use with transformers
	```python
	import torch
	from transformers import MllamaForConditionalGeneration, AutoProcessor
	from peft import PeftModel
	from PIL import Image

	# Base model ID and LoRA model ID
	base_model_id = "meta-llama/Llama-3.2-11B-Vision-Instruct"
	lora_model_id = "Kadins/Llama-3.2-Vision-chinese-lora"

	# Load the processor
	processor = AutoProcessor.from_pretrained(base_model_id)

	# Load the base model
	base_model = MllamaForConditionalGeneration.from_pretrained(
	base_model_id,
	device_map="auto",
	torch_dtype=torch.float16 # Use torch.bfloat16 if your hardware supports it
	).eval()

	# Load the LoRA model and apply it to the base model
	model = PeftModel.from_pretrained(base_model, lora_model_id)

	# Optionally, merge the LoRA weights with the base model for faster inference
	model = model.merge_and_unload()

	# Load an example image (replace 'path_to_image.jpg' with your image file)
	image_path = 'path_to_image.jpg'
	image = Image.open(image_path)

	# User prompt in Chinese
	user_prompt = "请描述这张图片。"

	# Prepare the content with the image and text
	content = [
	{"type": "image", "image": image},
	{"type": "text", "text": user_prompt}
	]

	# Apply the chat template to create the prompt
	prompt = processor.apply_chat_template(
	[{"role": "user", "content": content}],
	add_generation_prompt=True
	)

	# Prepare the inputs for the model
	inputs = processor(
	images=image,
	text=prompt,
	return_tensors="pt"
	).to(model.device)

	# Generate the model's response
	output = model.generate(**inputs, max_new_tokens=512)

	# Decode the output to get the assistant's response
	response = processor.decode(output[0], skip_special_tokens=True)

	# Print the assistant's response
	print("Assistant:", response)
	```