metadata

title: Generate With OpenAI CLIP
emoji: 🎨🤖🖼️
colorFrom: red
colorTo: indigo
sdk: streamlit
sdk_version: 1.37.1
app_file: app.py
pinned: false
license: afl-3.0

Image Understanding Model 🎨🤖

This application leverages OpenAI's CLIP (Contrastive Language-Image Pretraining) model to analyze images and match them with the most accurate text descriptions provided by the user. It uses Streamlit to create an interactive web interface where users can upload images and input descriptions. The model then predicts which description best fits the image based on probabilities.

Features

Image Upload: Users can upload an image (JPG, PNG, or JPEG).
Description Input: Users input 3 descriptions about the image (e.g., 2 false and 1 true).
Prediction: The model predicts the most likely description out of the three and provides a confidence score.
Progress Bar: A visual progress bar displays the confidence of the best description.

How It Works

Upload an Image: Users upload an image of their choice.
Input Descriptions: Users are prompted to enter 3 descriptions about the image, with 1 description being true.
Model Prediction: Once the descriptions are submitted, the CLIP model evaluates the image and the provided descriptions to predict which description best matches the image.
Result Display: The app displays the best-matching description and its corresponding probability, along with a progress bar showing the confidence of the prediction.

Technology Stack

OpenAI CLIP: The core model used for image and text understanding.
Torch: Used for model inference and handling tensors.
Streamlit: Provides the interactive web interface for uploading images and entering descriptions.
Pillow: For handling image processing.
NumPy: For efficient array and matrix operations.

APPLICATION

Simple Streamlit app for playing '2 Lies and a truth' with the model and friends.