Slep
/

CondViT-B16-cat

Feature Extraction

lrvsf-benchmark

Model card Files Files and versions Community

CondViT-B16-cat / README.md

Slep's picture

Change README header

f0ce72f 7 months ago

|

1.24 kB

	---
	license: mit
	---

	# Conditional ViT - B/16 - Categories

	Introduced in Weakly-Supervised Conditional Embedding for Referred Visual Search, Lepage et al. 2023

	[`Paper`](https://arxiv.org/abs/2306.02928) \| [`Training Data`](https://huggingface.co/datasets/Slep/LAION-RVS-Fashion) \| [`Training Code`](https://github.com/Simon-Lepage/CondViT-LRVSF) \| [`Demo`](https://huggingface.co/spaces/Slep/CondViT-LRVSF-Demo)

	## General Infos

	Model finetuned from CLIP ViT-B/16 on LRVSF at 224x224. The conditioning categories are the following :
	- Bags
	- Feet
	- Hands
	- Head
	- Lower Body
	- Neck
	- Outwear
	- Upper Body
	- Waist
	- Whole Body

	Research use only.

	## How to Use

	```python
	from PIL import Image
	import requests
	from transformers import AutoProcessor, AutoModel
	import torch

	model = AutoModel.from_pretrained("Slep/CondViT-B16-cat")
	processor = AutoProcessor.from_pretrained("Slep/CondViT-B16-cat")

	url = "https://huggingface.co/datasets/Slep/LAION-RVS-Fashion/resolve/main/assets/108856.0.jpg"
	img = Image.open(requests.get(url, stream=True).raw)
	cat = "Bags"

	inputs = processor(images=[img], categories=[cat])
	raw_embedding = model(**inputs)
	normalized_embedding = torch.nn.functional.normalize(raw_embedding, dim=-1)
	```