--- license: mit base_model: laion/CLIP-ViT-B-32-laion2B-s34B-b79K tags: - generated_from_trainer model-index: - name: laion-finetuned_v5e7_epoch10_fold0_threshold3 results: [] --- # laion-finetuned_room luxury annotater This model is a fine-tuned version of [laion/CLIP-ViT-B-32-laion2B-s34B-b79K](https://huggingface.co/laion/CLIP-ViT-B-32-laion2B-s34B-b79K) on a private dataset provided by Wahi Inc. It is designed to classify room images into categories based on their luxury level and room type. ## Model Description This model leverages a fine-tuned version of CLIP, specifically optimized for real estate image annotation. It performs zero-shot classification of room images into categories like standard or contemporary kitchens, bathrooms, and other common rooms in real estate properties. The model uses a multi-stage approach where diffusion models generate supplementary training data, and hierarchical CLIP networks perform luxury annotation. This fine-tuning process enables high accuracy in distinguishing luxury levels from real estate images. The model was developed for the paper *"Diffusion-based Data Augmentation and Hierarchical CLIP for Real Estate Image Annotation"* submitted to the *Pattern Analysis and Applications Special Issue on Multimedia Sensing and Computing*. ![Model Framework](framework.png) ## Intended Uses & Limitations This model is intended to be used for: - Annotating real estate images by classifying room types and luxury levels (e.g., standard or contemporary kitchens, bathrooms, etc.). - Helping users filter properties in real estate platforms based on the luxury level of rooms. **Limitations**: - The model is optimized for real estate images and may not generalize well to other domains. - Zero-shot classification is limited to the predefined categories and candidate labels used during fine-tuning. ## Training and Evaluation Data The training data was collected and labeled by Wahi Inc. and includes a diverse set of real estate images from kitchens, bathrooms, dining rooms, living rooms, and foyers. The images were annotated as either standard or contemporary, based on the room's aesthetics, design, and quality. ## Training Procedure ### Training Hyperparameters The following hyperparameters were used during training: - **Learning Rate**: 1e-06 - **Train Batch Size**: 384 - **Eval Batch Size**: 24 - **Seed**: 42 - **Optimizer**: Adam with betas=(0.9, 0.999) and epsilon=1e-08 - **LR Scheduler Type**: Linear ### Framework Versions - **Transformers**: 4.37.2 - **PyTorch**: 2.0.1+cu117 - **Datasets**: 2.14.4 - **Tokenizers**: 0.15.0 ### Output Example Below is an example of the model's output, where an image of a kitchen is classified with its top 3 predicted room types and confidence scores. ![Model Output Example](example_output.png) ## How to Use the Model You can use this model for zero-shot image classification with the HuggingFace `pipeline` API. Here is a basic example: ```python from transformers import pipeline # Initialize the pipeline classifier = pipeline("zero-shot-image-classification", model="strollingorange/roomLuxuryAnnotater") # Define the candidate labels candidate_labels = [ "a photo of standard bathroom", "a photo of contemporary bathroom", "a photo of standard kitchen", "a photo of contemporary kitchen", "a photo of standard foyer", "a photo of standard living room", "a photo of standard dining room", "a photo of contemporary foyer", "a photo of contemporary living room", "a photo of contemporary dining room" ] # Load your image (replace 'image_path' with your actual image path) image = Image.open('path_to_your_image.jpg') # Run zero-shot classification result = classifier(image, candidate_labels=candidate_labels) # Output the result print(result) # Acknowledgments We would like to acknowledge Wahi Inc. for providing the training data and their continued support in the development of this model. Their collaboration was essential in fine-tuning the model for real estate image annotation.