Edit model card

clip-vit-l-14-pmc-finetuned

This model is a fine-tuned version of openai/clip-vit-large-patch14 on an pmc_oa (https://huggingface.co/datasets/axiong/pmc_oa) dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0125

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 10.0

Training results

Framework versions

  • Transformers 4.31.0
  • Pytorch 2.0.1
  • Datasets 2.14.4
  • Tokenizers 0.13.3

finetune this model use the script from run_clip.py (https://github.com/huggingface/transformers/tree/main/examples/pytorch/contrastive-image-text)


python -W ignore run_clip.py --model_name_or_path openai/clip-vit-large-patch14 \
      --output_dir ./clip-vit-l-14-pmc-finetuned \
      --train_file data/pmc_roco_train.csv \
      --validation_file data/pmc_roco_valid.csv \
      --image_column image --caption_column caption \
      --max_seq_length 77 \
      --do_train --do_eval \
      --per_device_train_batch_size 16 --per_device_eval_batch_size 8 \
      --remove_unused_columns=False \
      --learning_rate="5e-5" --warmup_steps="0" --weight_decay 0.1 \
      --overwrite_output_dir  \
      --num_train_epochs 10 \
      --logging_dir ./pmc_vit_logs \
      --save_total_limit 2 \
      --report_to  tensorboard

usage

from PIL import Image
import requests

from transformers import CLIPProcessor, CLIPModel

model = CLIPModel.from_pretrained("ryanyip7777/pmc_vit-l-14_hf")
processor = CLIPProcessor.from_pretrained("ryanyip7777/pmc_vit-l-14_hf")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

inputs = processor(text=["a photo of a cat", "a photo of a dog"], images=image, return_tensors="pt", padding=True)

outputs = model(**inputs)
logits_per_image = outputs.logits_per_image # this is the image-text similarity score
probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities
Downloads last month
50

Finetuned from