PerceptCLIP-Memorability is a model designed to predict image memorability (the likelihood of an image to be remembered). This is the official model from the paper:
πŸ“„ "Don't Judge Before You CLIP: A Unified Approach for Perceptual Tasks". We apply LoRA adaptation on the CLIP visual encoder and add an MLP head for memorability prediction. Our model achieves state-of-the-art results.

Training Details

  • Dataset: LaMem (Large-Scale Image Memorability)
  • Architecture: CLIP Vision Encoder (ViT-L/14) with LoRA adaptation
  • Loss Function: Mean Squared Error (MSE) Loss for memorability prediction
  • Optimizer: AdamW
  • Learning Rate: 5e-05
  • Batch Size: 32

Installation & Requirements

You can set up the environment using environment.yml or manually install dependencies:

  • python=3.9.15
  • cudatoolkit=11.7
  • torchvision=0.14.0
  • transformers=4.45.2
  • peft=0.14.0

Usage

To use the model for inference:

from torchvision import transforms
import torch
from PIL import Image
from huggingface_hub import hf_hub_download
import importlib.util

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load the model class definition dynamically
class_path = hf_hub_download(repo_id="PerceptCLIP/PerceptCLIP_Memorability", filename="modeling.py")
spec = importlib.util.spec_from_file_location("modeling", class_path)
modeling = importlib.util.module_from_spec(spec)
spec.loader.exec_module(modeling)

# initialize a model
ModelClass = modeling.clip_lora_model 
model = ModelClass().to(device)

# Load pretrained model
model_path = hf_hub_download(repo_id="PerceptCLIP/PerceptCLIP_Memorability", filename="perceptCLIP_Memorability.pth")
model.load_state_dict(torch.load(model_path, map_location=device))
model.eval()
# Load an image
image = Image.open("image_path.jpg").convert("RGB")

# Preprocess and predict
def Mem_preprocess():
    transform = transforms.Compose([
        transforms.Resize(224),
        transforms.CenterCrop(size=(224, 224)),  
        transforms.ToTensor(),
        transforms.Normalize(mean=(0.48145466, 0.4578275, 0.40821073), 
                             std=(0.26862954, 0.26130258, 0.27577711))
    ])
    return transform

image = Mem_preprocess()(image).unsqueeze(0).to(device)

with torch.no_grad():
    mem_score = model(image).item()

print(f"Predicted Memorability Score: {mem_score:.4f}")

Citation

If you use this model in your research, please cite:

@article{zalcher2025don,
  title={Don't Judge Before You CLIP: A Unified Approach for Perceptual Tasks},
  author={Zalcher, Amit and Wasserman, Navve and Beliy, Roman and Heinimann, Oliver and Irani, Michal},
  journal={arXiv preprint arXiv:2503.13260},
  year={2025}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for PerceptCLIP/PerceptCLIP_Memorability

Finetuned
(64)
this model

Space using PerceptCLIP/PerceptCLIP_Memorability 1