--- pipeline_tag: image-classification tags: - vision inference: false widget: - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/cat-dog-music.png example_title: Cat & Dog --- # Category Search from External Databases (CaSED) Disclaimer: The model card is taken and modified from the official repository, which can be found [here](https://github.com/altndrr/vic). The paper can be found [here](https://arxiv.org/abs/2306.00917). ## Intended uses & limitations You can use the model for vocabulary-free image classification, i.e. classification with CLIP-like models without a pre-defined list of class names. ## How to use Here is how to use this model: ```python import requests from PIL import Image from transformers import AutoModel, CLIPProcessor # download an image from the internet url = "http://images.cocodataset.org/val2017/000000039769.jpg" image = Image.open(requests.get(url, stream=True).raw) # load the model and the processor model = AutoModel.from_pretrained("altndrr/cased", trust_remote_code=True) processor = CLIPProcessor.from_pretrained("openai/clip-vit-large-patch14") # get the model outputs images = processor(images=[image], return_tensors="pt", padding=True) outputs = model(images, alpha=0.7) labels, scores = outputs["vocabularies"][0], outputs["scores"][0] # print the top 5 most likely labels for the image values, indices = scores.sort(dim=-1, descending=True) print("\nTop predictions:\n") for value, index in zip(values, indices): print(f"{labels[index]:>16s}: {100 * value.item():.2f}%") ``` The model depends on some libraries you have to install manually before execution: ```bash pip install torch faiss-cpu flair inflect nltk pyarrow transformers ``` ## Citation ```latex @article{conti2023vocabularyfree, title={Vocabulary-free Image Classification}, author={Alessandro Conti and Enrico Fini and Massimiliano Mancini and Paolo Rota and Yiming Wang and Elisa Ricci}, year={2023}, journal={NeurIPS}, } ```