|
--- |
|
license: apache-2.0 |
|
language: |
|
- zh |
|
- en |
|
base_model: |
|
- meta-llama/Llama-3.2-11B-Vision-Instruct |
|
tags: |
|
- llama |
|
- lora |
|
- chinese |
|
- zh |
|
- mllama |
|
pipeline_tag: image-text-to-text |
|
library_name: peft |
|
--- |
|
|
|
# Llama-3.2-Vision-chinese-lora |
|
- base model: [meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) |
|
## Features |
|
- Utilize a large amount of high-quality Chinese text and VQA data to significantly enhance the model's Chinese OCR capabilities. |
|
## Use with transformers |
|
```python |
|
import torch |
|
from transformers import MllamaForConditionalGeneration, AutoProcessor |
|
from peft import PeftModel |
|
from PIL import Image |
|
|
|
# Base model ID and LoRA model ID |
|
base_model_id = "meta-llama/Llama-3.2-11B-Vision-Instruct" |
|
lora_model_id = "Kadins/Llama-3.2-Vision-chinese-lora" |
|
|
|
# Load the processor |
|
processor = AutoProcessor.from_pretrained(base_model_id) |
|
|
|
# Load the base model |
|
base_model = MllamaForConditionalGeneration.from_pretrained( |
|
base_model_id, |
|
device_map="auto", |
|
torch_dtype=torch.float16 # Use torch.bfloat16 if your hardware supports it |
|
).eval() |
|
|
|
# Load the LoRA model and apply it to the base model |
|
model = PeftModel.from_pretrained(base_model, lora_model_id) |
|
|
|
# Optionally, merge the LoRA weights with the base model for faster inference |
|
model = model.merge_and_unload() |
|
|
|
# Load an example image (replace 'path_to_image.jpg' with your image file) |
|
image_path = 'path_to_image.jpg' |
|
image = Image.open(image_path) |
|
|
|
# User prompt in Chinese |
|
user_prompt = "请描述这张图片。" |
|
|
|
# Prepare the content with the image and text |
|
content = [ |
|
{"type": "image", "image": image}, |
|
{"type": "text", "text": user_prompt} |
|
] |
|
|
|
# Apply the chat template to create the prompt |
|
prompt = processor.apply_chat_template( |
|
[{"role": "user", "content": content}], |
|
add_generation_prompt=True |
|
) |
|
|
|
# Prepare the inputs for the model |
|
inputs = processor( |
|
images=image, |
|
text=prompt, |
|
return_tensors="pt" |
|
).to(model.device) |
|
|
|
# Generate the model's response |
|
output = model.generate(**inputs, max_new_tokens=512) |
|
|
|
# Decode the output to get the assistant's response |
|
response = processor.decode(output[0], skip_special_tokens=True) |
|
|
|
# Print the assistant's response |
|
print("Assistant:", response) |
|
``` |