ryanyip7777
/

pmc_vit-l-14_hf

Zero-Shot Image Classification

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

pmc_vit-l-14_hf / README.md

ryanyip7777's picture

Update README.md

3197bd8 12 months ago

|

history blame contribute delete

No virus

2.76 kB

	---
	base_model: openai/clip-vit-large-patch14
	tags:
	- generated_from_trainer
	model-index:
	- name: clip-vit-l-14-pmc-finetuned
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# clip-vit-l-14-pmc-finetuned

	This model is a fine-tuned version of [openai/clip-vit-large-patch14](https://huggingface.co/openai/clip-vit-large-patch14) on an pmc_oa (https://huggingface.co/datasets/axiong/pmc_oa) dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.0125

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 16
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 10.0

	### Training results



	### Framework versions

	- Transformers 4.31.0
	- Pytorch 2.0.1
	- Datasets 2.14.4
	- Tokenizers 0.13.3


	### finetune this model use the script from run_clip.py (https://github.com/huggingface/transformers/tree/main/examples/pytorch/contrastive-image-text)
	```shell

	python -W ignore run_clip.py --model_name_or_path openai/clip-vit-large-patch14 \
	--output_dir ./clip-vit-l-14-pmc-finetuned \
	--train_file data/pmc_roco_train.csv \
	--validation_file data/pmc_roco_valid.csv \
	--image_column image --caption_column caption \
	--max_seq_length 77 \
	--do_train --do_eval \
	--per_device_train_batch_size 16 --per_device_eval_batch_size 8 \
	--remove_unused_columns=False \
	--learning_rate="5e-5" --warmup_steps="0" --weight_decay 0.1 \
	--overwrite_output_dir \
	--num_train_epochs 10 \
	--logging_dir ./pmc_vit_logs \
	--save_total_limit 2 \
	--report_to tensorboard
	```

	### usage
	```python
	from PIL import Image
	import requests

	from transformers import CLIPProcessor, CLIPModel

	model = CLIPModel.from_pretrained("ryanyip7777/pmc_vit-l-14_hf")
	processor = CLIPProcessor.from_pretrained("ryanyip7777/pmc_vit-l-14_hf")

	url = "http://images.cocodataset.org/val2017/000000039769.jpg"
	image = Image.open(requests.get(url, stream=True).raw)

	inputs = processor(text=["a photo of a cat", "a photo of a dog"], images=image, return_tensors="pt", padding=True)

	outputs = model(**inputs)
	logits_per_image = outputs.logits_per_image # this is the image-text similarity score
	probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities
	```