Instructions to use hiker-lw/MACCO with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- OpenCLIP
How to use hiker-lw/MACCO with OpenCLIP:
import open_clip model, preprocess_train, preprocess_val = open_clip.create_model_and_transforms('hf-hub:hiker-lw/MACCO') tokenizer = open_clip.get_tokenizer('hf-hub:hiker-lw/MACCO') - Notebooks
- Google Colab
- Kaggle
Cross-Modal Masked Compositional Concept Modeling for Enhancing Visio-Linguistic Compositionality
This repository contains the official model weights for the paper "Cross-Modal Masked Compositional Concept Modeling for Enhancing Visio-Linguistic Compositionality", accepted as a long paper at ACL 2026.
π Introduction
MACCO (MAsked Compositional Concept MOdeling) is a framework designed to enhance compositional understanding in vision-language models (VLMs) like CLIP. It addresses the "bag-of-words" limitation by masking compositional concepts in one modality and reconstructing them conditioned on the full contextual information from the other modality. This process enables the model to capture and align cross-modal compositional structuresβsuch as object relations and attribute-object bindingsβmore effectively than standard contrastive training.
π» Usage
You can load these checkpoints using the open_clip library.
import open_clip
import torch
# Path to the downloaded .pt file (e.g., 'MACCO-CLIP-ViT-B-32.pt')
pretrained_path = 'path/to/MACCO-CLIP-ViT-B-32.pt'
device = "cuda" if torch.cuda.is_available() else "cpu"
# Create model and load the MACCO weights
model, _, image_preprocess = open_clip.create_model_and_transforms(
'ViT-B-32',
pretrained=pretrained_path,
device=device
)
model = model.eval()
print("MACCO model loaded successfully!")
ποΈ Citation
If you find this work useful for your research, please consider citing:
@misc{li2026crossmodalmaskedcompositionalconcept,
title={Cross-Modal Masked Compositional Concept Modeling for Enhancing Visio-Linguistic Compositionality},
author={Wei Li and Zhen Huang and Xinmei Tian},
year={2026},
eprint={2606.13288},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2606.13288},
}
- Downloads last month
- -
Model tree for hiker-lw/MACCO
Base model
openai/clip-vit-base-patch16