Persian-CLIP / README.md

Update README.md

1a0cf9a verified 7 months ago

5.03 kB

	---
	tags:
	- generated_from_trainer
	model-index:
	- name: persian-clip
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# persian-clip

	This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.7629

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 3e-05
	- train_batch_size: 32
	- eval_batch_size: 32
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 100
	- num_epochs: 5

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 1.4072 \| 0.12 \| 100 \| 2.1627 \|
	\| 1.7146 \| 0.25 \| 200 \| 1.6432 \|
	\| 1.5058 \| 0.37 \| 300 \| 1.4523 \|
	\| 1.3836 \| 0.49 \| 400 \| 1.4799 \|
	\| 1.4946 \| 0.62 \| 500 \| 1.3101 \|
	\| 1.2544 \| 0.74 \| 600 \| 1.2073 \|
	\| 1.1984 \| 0.86 \| 700 \| 1.1801 \|
	\| 1.3243 \| 0.99 \| 800 \| 1.1652 \|
	\| 0.8373 \| 1.11 \| 900 \| 1.0860 \|
	\| 0.8625 \| 1.23 \| 1000 \| 1.0731 \|
	\| 0.791 \| 1.36 \| 1100 \| 1.0427 \|
	\| 0.8975 \| 1.48 \| 1200 \| 1.0786 \|
	\| 0.7767 \| 1.6 \| 1300 \| 1.0248 \|
	\| 0.9041 \| 1.73 \| 1400 \| 1.0311 \|
	\| 0.8474 \| 1.85 \| 1500 \| 0.9649 \|
	\| 0.7435 \| 1.98 \| 1600 \| 0.9552 \|
	\| 0.5126 \| 2.1 \| 1700 \| 0.9909 \|
	\| 0.4871 \| 2.22 \| 1800 \| 0.9188 \|
	\| 0.48 \| 2.35 \| 1900 \| 0.9151 \|
	\| 0.4715 \| 2.47 \| 2000 \| 0.9056 \|
	\| 0.408 \| 2.59 \| 2100 \| 0.8885 \|
	\| 0.4999 \| 2.72 \| 2200 \| 0.8911 \|
	\| 0.5169 \| 2.84 \| 2300 \| 0.8727 \|
	\| 0.3574 \| 2.96 \| 2400 \| 0.8477 \|
	\| 0.2749 \| 3.09 \| 2500 \| 0.8666 \|
	\| 0.2719 \| 3.21 \| 2600 \| 0.8520 \|
	\| 0.2779 \| 3.33 \| 2700 \| 0.8379 \|
	\| 0.3407 \| 3.46 \| 2800 \| 0.8386 \|
	\| 0.223 \| 3.58 \| 2900 \| 0.8245 \|
	\| 0.2649 \| 3.7 \| 3000 \| 0.8149 \|
	\| 0.2698 \| 3.83 \| 3100 \| 0.7983 \|
	\| 0.1863 \| 3.95 \| 3200 \| 0.7959 \|
	\| 0.1831 \| 4.07 \| 3300 \| 0.7957 \|
	\| 0.172 \| 4.2 \| 3400 \| 0.7963 \|
	\| 0.1457 \| 4.32 \| 3500 \| 0.7879 \|
	\| 0.1503 \| 4.44 \| 3600 \| 0.7794 \|
	\| 0.1783 \| 4.57 \| 3700 \| 0.7788 \|
	\| 0.166 \| 4.69 \| 3800 \| 0.7753 \|
	\| 0.1598 \| 4.81 \| 3900 \| 0.7673 \|
	\| 0.1618 \| 4.94 \| 4000 \| 0.7629 \|


	### Framework versions

	- Transformers 4.38.2
	- Pytorch 2.1.2+cu121
	- Datasets 2.10.1
	- Tokenizers 0.15.0

	### How to use?
	```python
	# Both models generate vectors with 768 dimensions.
	from transformers import CLIPVisionModel, RobertaModel, AutoTokenizer, CLIPFeatureExtractor
	# download pre-trained models
	vision_encoder = CLIPVisionModel.from_pretrained('SeyedAli/Persian-CLIP')
	preprocessor = CLIPFeatureExtractor.from_pretrained('SeyedAli/Persian-CLIP')
	text_encoder = RobertaModel.from_pretrained('SeyedAli/Persian-CLIP')
	tokenizer = AutoTokenizer.from_pretrained('SeyedAli/Persian-CLIP')
	# define input image and input text
	text = 'something'
	image = PIL.Image.open('my_favorite_image.jpg')
	# compute embeddings
	text_embedding = text_encoder(**tokenizer(text,
	return_tensors='pt')).pooler_output
	image_embedding = vision_encoder(**preprocessor(image,
	return_tensors='pt')).pooler_output
	```

	### zero-shot-Image-Classification:
	The followings are just some use cases of Persian-CLIP on 25K Unsplash images

	* use pip install -q git+https://github.com/sajjjadayobi/clipfa.git
	```python
	from clipfa import CLIPDemo
	import torch
	# Both models generate vectors with 768 dimensions.
	from transformers import CLIPVisionModel, RobertaModel, AutoTokenizer, CLIPFeatureExtractor
	# download pre-trained models
	vision_encoder = CLIPVisionModel.from_pretrained('SeyedAli/Persian-CLIP')
	preprocessor = CLIPFeatureExtractor.from_pretrained('SeyedAli/Persian-CLIP')
	text_encoder = RobertaModel.from_pretrained('SeyedAli/Persian-CLIP')
	tokenizer = AutoTokenizer.from_pretrained('SeyedAli/Persian-CLIP')

	demo = CLIPDemo(vision_encoder, text_encoder, tokenizer)
	demo.compute_text_embeddings(['متن 3' ,'متن 2' ,'متن 1'])
	demo.compute_image_embeddings(['my_favorite_image.jpg'])
	demo.zero_shot(image_path='my_favorite_image.jpg')

	```