=====CLIP-ViT-L-14-448px-MedICaT-ROCO=====

Pretrained Biomed CLIP model with higher resolution. Suitable for many medical downstream tasks.

Dataset: MedICaT-200k, ROCO-80k

Base model: [https://huggingface.co/ryanyip7777/pmc_vit_l_14]

Training config:
img-size: 448
lr: 1.024e-6
epoch: 6
batchsize: 16

Benchmark: ROCO-validation-8785samples

model clip_val_loss image_to_text_mean_rank image_to_text_R@10 text_to_image_mean_rank text_to_image_R@10
pmc_vit_l_14 0.6886 41.4641 0.6263 54.4236 0.6410
CLIP-ViT-L-14-448px-MedICaT-ROCO 0.3266 34.4018 0.6748 42.0458 0.6791

We use code base from open_clip[https://github.com/mlfoundations/open_clip]
Add personal configs in path ./open_clip-main/src/open_clip/model_configs to load this model

import torch
from PIL import Image
import open_clip

model, _ , preprocess = open_clip.create_model_and_transforms('hf-hub:luhuitong/CLIP-ViT-L-14-448px-MedICaT-ROCO')
tokenizer = open_clip.get_tokenizer('hf-hub:luhuitong/CLIP-ViT-L-14-448px-MedICaT-ROCO')

image = preprocess(Image.open("xray.png")).unsqueeze(0)
text = tokenizer(["xray", "CT", "MRI"])

with torch.no_grad(), torch.cuda.amp.autocast():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    image_features /= image_features.norm(dim=-1, keepdim=True)
    text_features /= text_features.norm(dim=-1, keepdim=True)

    text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)

print("Label probs:", text_probs)
Downloads last month
15
Safetensors
Model size
428M params
Tensor type
F32
·
Inference API
Unable to determine this model's library. Check the docs .