YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Model Card for PickScore v1
This model is a scoring function for images generated from text. It takes as input a prompt and a generated image and outputs a score. It can be used as a general scoring function, and for tasks such as human preference prediction, model evaluation, image ranking, and more. See our paper Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation for more details.
Model Details
Model Description
This model was finetuned from CLIP-H using the Pick-a-Pic dataset.
Model Sources [optional]
- Repository: See the PickScore repo
- Paper: Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation.
- Demo [optional]: Huggingface Spaces demo for PickScore
How to Get Started with the Model
Use the code below to get started with the model.
# import
from transformers import AutoProcessor, AutoModel
# load model
device = "cuda"
processor_name_or_path = "laion/CLIP-ViT-H-14-laion2B-s32B-b79K"
model_pretrained_name_or_path = "yuvalkirstain/PickScore_v1"
processor = AutoProcessor.from_pretrained(processor_name_or_path)
model = AutoModel.from_pretrained(model_pretrained_name_or_path).eval().to(device)
def calc_probs(prompt, images):
# preprocess
image_inputs = processor(
images=images,
padding=True,
truncation=True,
max_length=77,
return_tensors="pt",
).to(device)
text_inputs = processor(
text=prompt,
padding=True,
truncation=True,
max_length=77,
return_tensors="pt",
).to(device)
with torch.no_grad():
# embed
image_embs = model.get_image_features(**image_inputs)
image_embs = image_embs / torch.norm(image_embs, dim=-1, keepdim=True)
text_embs = model.get_text_features(**text_inputs)
text_embs = text_embs / torch.norm(text_embs, dim=-1, keepdim=True)
# score
scores = model.logit_scale.exp() * (text_embs @ image_embs.T)[0]
# get probabilities if you have multiple images to choose from
probs = torch.softmax(scores, dim=-1)
return probs.cpu().tolist()
pil_images = [Image.open("my_amazing_images/1.jpg"), Image.open("my_amazing_images/2.jpg")]
prompt = "fantastic, increadible prompt"
print(calc_probs(prompt, pil_images))
Training Details
Training Data
This model was trained on the Pick-a-Pic dataset.
Training Procedure
TODO - add paper.
Citation [optional]
If you find this work useful, please cite:
@inproceedings{Kirstain2023PickaPicAO,
title={Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation},
author={Yuval Kirstain and Adam Polyak and Uriel Singer and Shahbuland Matiana and Joe Penna and Omer Levy},
year={2023}
}
APA:
[More Information Needed]
- Downloads last month
- 100
This model does not have enough activity to be deployed to Inference API (serverless) yet.
Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.