Uploaded finetuned model

  • Developed by: mgbam
  • License: apache-2.0
  • Finetuned from model : unsloth/llama-3.2-11b-vision-instruct-unsloth-bnb-4bit

This mllama model was trained 2x faster with Unsloth and Huggingface's TRL library.

MGBAM Unsloth Fine-Tuned Model for Radiological Image Interpretation

This repository contains the fine-tuned version of the MGBAM Unsloth model—a vision-language model specialized in generating detailed, expert-level descriptions of radiological images. The model has been fine-tuned using parameter-efficient methods (LoRA) with 4-bit quantization to enable efficient inference on resource-constrained hardware (e.g., Colab T4 GPUs).

Model Overview

MGBAM Unsloth Fine-Tuned Model leverages advanced techniques such as:

  • LoRA (Low-Rank Adaptation): Enables fine-tuning with a small set of trainable parameters.
  • 4-bit Quantization: Reduces memory footprint while maintaining performance, allowing inference on GPUs with limited VRAM.
  • Chat-Style Interface: Accepts both image inputs and textual instructions for generating detailed descriptions.

This model is designed to assist in radiological image interpretation by generating descriptive text based on expert-level instructions.

Intended Use

Primary Use Cases

  • Biomedical Research: Aid in developing and evaluating computer-aided diagnosis tools.
  • Radiology Training: Provide descriptive feedback to support educational initiatives in medical imaging.
  • Prototyping: Serve as a component for research projects exploring automated image analysis.

Limitations

  • Domain Specificity: The model is fine-tuned specifically for radiological images and may not generalize well to non-medical imagery.
  • Clinical Reliability: The outputs are intended for research and educational purposes only. They should not replace professional medical diagnosis.
  • Quantization Effects: While 4-bit quantization enables efficient inference, it may lead to slight performance variations compared to full-precision models.

How to Use

Below is an example Python snippet demonstrating how to run the model using the FastVisionModel helper. This code loads the model in 4-bit mode, processes an image with a custom chat-style prompt, and streams the generated output.

from unsloth import FastVisionModel
from PIL import Image
from transformers import TextStreamer

# Load the model and tokenizer from the MGBAM fine-tuned model in 4-bit mode.
model, tokenizer = FastVisionModel.from_pretrained(
    model_name="mgbam/unsloth_finetune",  # Use your fine-tuned MGBAM model
    load_in_4bit=True                     # Set to False for 16-bit LoRA if preferred
)
FastVisionModel.for_inference(model)      # Enable inference optimizations

# Load the local image from the specified file path.
image = Image.open("/content/1740468056180.jpeg").convert("RGB")

# Define the instruction for the model.
instruction = "You are an expert radiographer. Describe accurately what you see in this image."

# Create a chat-style message combining the image placeholder and the text instruction.
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": instruction}
        ]
    }
]

# Generate the input prompt using the custom chat template.
input_text = tokenizer.apply_chat_template(messages, add_generation_prompt=True)

# Tokenize the image and the text prompt together.
inputs = tokenizer(
    image,
    input_text,
    add_special_tokens=False,
    return_tensors="pt",
).to("cuda")

# Set up a TextStreamer to stream the output as it is generated.
text_streamer = TextStreamer(tokenizer, skip_prompt=True)

# Generate the output text using the model.
_ = model.generate(
    **inputs,
    streamer=text_streamer,
    max_new_tokens=128,
    use_cache=True,
    temperature=1.5,
    min_p=0.1
)
Training Details
Base Model: Pretrained MGBAM model.
Fine-Tuning Approach:
Utilized LoRA for parameter-efficient fine-tuning.
Applied 4-bit quantization to reduce VRAM usage.
Dataset: Fine-tuned on a curated dataset of radiological images paired with expert-level captions. (Additional dataset details can be provided here.)
Compute Environment: Fine-tuning was performed on GPUs with limited VRAM (e.g., Colab T4) using mixed precision and efficient quantization techniques.
Citation
If you use this model in your research, please cite:

bibtex
Copy
@misc{mgbam_unsloth_finetuned,
  title={MGBAM Unsloth Fine-Tuned Model for Radiological Image Interpretation},
  author={Your Name and Collaborators},
  year={2024},
  howpublished={\url{https://huggingface.co/mgbam/unsloth_finetune}},
  note={This model is provided for research and educational purposes only.}
}
License
This model is released under the [Your License Name] license. (Please specify the full license text or link to the license file.)

Acknowledgements
We thank the Hugging Face community, the developers behind FastVisionModel, and the medical experts whose insights have contributed to the fine-tuning process.

This model card provides an overview of the fine-tuned MGBAM Unsloth model, its training details, intended use cases, and inference instructions. For updates and additional information, please refer to the repository documentation.

yaml
Copy

---

You can copy and paste this Markdown into your model card (e.g., `README.md`) on Hugging Face. Adjust any sections to reflect additional details about your training process, dataset, or license as needed.
Downloads last month
24
Safetensors
Model size
10.7B params
Tensor type
BF16
·
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.