Rishi1708
/

LLaMA3.2-11B-VisionInstruct-MedicalXray-ScanAnalysis-v1.0

Safetensors

mllama

Model card Files Files and versions

xet

Community

Rishi1708 commited on Jul 11

Commit

f8629fc

verified ·

1 Parent(s): c6478f5

Update README.md

Browse files

Files changed (1) hide show

README.md +37 -19

README.md CHANGED Viewed

@@ -39,25 +39,41 @@ The model was fine-tuned using the Hugging Face Transformers library, leveraging
 To use this model for generating captions for medical images, you can load it using the Hugging Face Transformers library. Below is an example of how to load the model and generate a caption for an image:
 ```python
-from transformers import AutoModel, AutoTokenizer
-from PIL import Image
 import torch
-# Load the model and tokenizer
-model_name = "Rishi1708/LLaMA3.2-11B-VisionInstruct-MedicalXray-ScanAnalysis-v1.0"
-model = AutoModel.from_pretrained(model_name)
-tokenizer = AutoTokenizer.from_pretrained(model_name)
-# Load and preprocess the image
-image = Image.open("path/to/medical_image.jpg")
-# Preprocess the image according to the model's requirements
-# Generate caption
-# Note: The exact input format may vary; refer to the model's documentation
-inputs = tokenizer(image, return_tensors="pt")
-outputs = model.generate(**inputs)
-caption = tokenizer.decode(outputs[0], skip_special_tokens=True)
-print(caption)
 ```
 **Note:** The exact method to prepare inputs and generate outputs may depend on the specific model architecture. Please refer to the base model's documentation for detailed usage instructions.
@@ -66,10 +82,12 @@ print(caption)
 - `transformers`
 - `torch`
 - `Pillow` (for image handling)
 Install these using:
 ```bash
-pip install transformers torch Pillow
 ```
 ## Evaluation

 To use this model for generating captions for medical images, you can load it using the Hugging Face Transformers library. Below is an example of how to load the model and generate a caption for an image:
 ```python
+from transformers import AutoProcessor, AutoModelForVision2Seq, BitsAndBytesConfig
 import torch
+from PIL import Image
+# Configure 4-bit quantization to reduce memory usage
+quantization_config = BitsAndBytesConfig(load_in_4bit=True)
+# Load the processor and model from Hugging Face
+processor = AutoProcessor.from_pretrained("Rishi1708/LLaMA3.2-11B-VisionInstruct-MedicalXray-ScanAnalysis-v1.0")
+model = AutoModelForVision2Seq.from_pretrained(
+    "Rishi1708/LLaMA3.2-11B-VisionInstruct-MedicalXray-ScanAnalysis-v1.0",
+    quantization_config=quantization_config,
+    device_map="auto"
+)
+# Prepare the input
+# Load an X-ray image from a local path (replace with your image path)
+image_path = "xray.jpg"
+image = Image.open(image_path).convert("RGB")
+# Define the text prompt
+prompt = "Analyze this X-ray image and describe any abnormalities."
+# Process the inputs (text and image) into a format the model expects
+inputs = processor(text=prompt, images=image, return_tensors="pt").to("cuda")
+# Generate the output
+with torch.no_grad():
+    outputs = model.generate(
+        input_ids=inputs["input_ids"],
+        pixel_values=inputs["pixel_values"],
+        attention_mask=inputs["attention_mask"],
+        aspect_ratio_ids=inputs["aspect_ratio_ids"],
+        aspect_ratio_mask=inputs["aspect_ratio_mask"],
+        max_new_tokens=100,
+        do_sample=False  # Set to True for sampling-based generation if needed
+    )
+# Decode the generated output into readable text
+generated_text = processor.decode(outputs[0], skip_special_tokens=True)
+# Print the result
+print("Model Output:", generated_text)
 ```
 **Note:** The exact method to prepare inputs and generate outputs may depend on the specific model architecture. Please refer to the base model's documentation for detailed usage instructions.
 - `transformers`
 - `torch`
 - `Pillow` (for image handling)
+- `bitsandbytes`
+- `accelerate`
 Install these using:
 ```bash
+pip install transformers torch Pillow bitsandbytes accelerate
 ```
 ## Evaluation