Llama-3.2-11B X 射线分析模型 (v1)
该模型是 unsloth/Llama-3.2-11B-Vision-Instruct-bnb-4bit 的微调版本,专为分析医学放射影像(X 射线、CT 扫描、超声)而设计。它在 ROCO 放射数据集 (unsloth/Radiology_mini) 的一个子集上进行训练,以生成图像的描述,充当专业的放射技师。
训练
该模型使用 Unsloth 进行微调,以实现 2 倍更快的训练速度并减少内存使用。关键训练细节:
- 基础模型:
unsloth/Llama-3.2-11B-Vision-Instruct-bnb-4bit
- 数据集:
unsloth/Radiology_mini
- LoRA 适配器: 使用了参数高效的微调,仅训练模型参数的一小部分。
r = 16
lora_alpha = 16
lora_dropout = 0
- 训练超参数:
learning_rate = 2e-4
per_device_train_batch_size = 2
gradient_accumulation_steps = 4
max_steps = 30
(或num_train_epochs = 1
用于完整运行)optimizer = adamw_8bit
- 微调配置:
finetune_vision_layers = False
finetune_language_layers = True
finetune_attention_modules = True
finetune_mlp_modules = True
用法
该模型设计为将图像和文本提示作为输入,并生成文本描述。预期的输入格式是一个对话,如下所示:
from unsloth import FastVisionModel
import torch
from transformers import TextStreamer
# 加载模型 (如果从本地加载,请将 "YLX1965/llama3-2-11b-xray-v1" 替换为您的模型路径)
model, tokenizer = FastVisionModel.from_pretrained(
"YLX1965/llama3-2-11b-xray-v1",
load_in_4bit = True, # 对于 16 位加载,设置为 False
)
FastVisionModel.for_inference(model)
# 示例图像和指令 (替换为您的图像路径)
# from datasets import load_dataset
# dataset = load_dataset("unsloth/Radiology_mini", split="train")
# image = dataset[0]["image"]
image = "path/to/your/image.jpg" # 示例
instruction = "你是一位专业的放射科医生。请准确描述您在这张图片中看到的内容。"
messages = [
{"role": "user", "content": [
{"type": "image"},
{"type": "text", "text": instruction}
]}
]
input_text = tokenizer.apply_chat_template(messages, add_generation_prompt = True)
inputs = tokenizer(
image,
input_text,
add_special_tokens = False,
return_tensors = "pt",
).to("cuda")
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128,
use_cache = True, temperature = 1.5, min_p = 0.1)
- Downloads last month
- 57
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The HF Inference API does not support image-text-to-text models for diffusers library.
Model tree for YLX1965/llama3-2-11b-xray-v1
Base model
meta-llama/Llama-3.2-11B-Vision-Instruct