File size: 2,257 Bytes
04bbbb7 9d033a6 04bbbb7 ca8f1b1 430fa96 04bbbb7 94f55c5 04bbbb7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
---
license: apache-2.0
language:
- zh
- en
base_model:
- meta-llama/Llama-3.2-11B-Vision-Instruct
tags:
- llama
- lora
- chinese
- zh
- mllama
pipeline_tag: image-text-to-text
library_name: peft
---
# Llama-3.2-Vision-chinese-lora
- base model: [meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct)
## Features
- Utilize a large amount of high-quality Chinese text and VQA data to significantly enhance the model's Chinese OCR capabilities.
## Use with transformers
```python
import torch
from transformers import MllamaForConditionalGeneration, AutoProcessor
from peft import PeftModel
from PIL import Image
# Base model ID and LoRA model ID
base_model_id = "meta-llama/Llama-3.2-11B-Vision-Instruct"
lora_model_id = "Kadins/Llama-3.2-Vision-chinese-lora"
# Load the processor
processor = AutoProcessor.from_pretrained(base_model_id)
# Load the base model
base_model = MllamaForConditionalGeneration.from_pretrained(
base_model_id,
device_map="auto",
torch_dtype=torch.float16 # Use torch.bfloat16 if your hardware supports it
).eval()
# Load the LoRA model and apply it to the base model
model = PeftModel.from_pretrained(base_model, lora_model_id)
# Optionally, merge the LoRA weights with the base model for faster inference
model = model.merge_and_unload()
# Load an example image (replace 'path_to_image.jpg' with your image file)
image_path = 'path_to_image.jpg'
image = Image.open(image_path)
# User prompt in Chinese
user_prompt = "请描述这张图片。"
# Prepare the content with the image and text
content = [
{"type": "image", "image": image},
{"type": "text", "text": user_prompt}
]
# Apply the chat template to create the prompt
prompt = processor.apply_chat_template(
[{"role": "user", "content": content}],
add_generation_prompt=True
)
# Prepare the inputs for the model
inputs = processor(
images=image,
text=prompt,
return_tensors="pt"
).to(model.device)
# Generate the model's response
output = model.generate(**inputs, max_new_tokens=512)
# Decode the output to get the assistant's response
response = processor.decode(output[0], skip_special_tokens=True)
# Print the assistant's response
print("Assistant:", response)
``` |