Spaces:

manisharma494
/

Virtual-Search-System

Sleeping

App Files Files Community

manisharma494 commited on Sep 3

Commit

427f366

verified ·

1 Parent(s): a2cddbc

Upload 5 files

Browse files

Files changed (5) hide show

README.md +365 -13
download_images.py +220 -0
photos_url.csv +0 -0
requirements.txt +5 -0
start_app.py +398 -0

README.md CHANGED Viewed

@@ -1,19 +1,371 @@
 ---
-title: Virtual Search System
-emoji: 🚀
-colorFrom: red
-colorTo: red
-sdk: docker
-app_port: 8501
-tags:
-- streamlit
 pinned: false
-short_description: Streamlit template space
 ---
-# Welcome to Streamlit!
-Edit `/src/streamlit_app.py` to customize this app to your heart's desire. :heart:
-If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
-forums](https://discuss.streamlit.io).

 ---
+title: Visual Search System
+emoji: 🔍
+colorFrom: blue
+colorTo: green
+sdk: streamlit
+sdk_version: "1.37.0"
+app_file: app.py
 pinned: false
+license: mit
 ---
+# Modern Visual Search System (CLIP-based)
+## Overview
+This project implements a modern, scalable visual search system using OpenAI's CLIP model. It provides both a REST API (FastAPI) and a web interface (Streamlit) for searching large image datasets using natural language queries and image queries, returning the top 5 most visually relevant images.
+## Features
+- **REST API**: FastAPI backend with comprehensive endpoints
+- **Web Interface**: Streamlit frontend with intuitive UI
+- **Text Search**: Search images using natural language descriptions
+- **Image Search**: Find similar images by providing an image as query
+- **Image Upload**: Upload images to find similar images in the dataset
+- **Top 5 Results**: Always returns the 5 most similar images
+- **Fast Search**: Uses precomputed embeddings for optimal performance
+- **GPU Acceleration**: Automatic GPU detection and utilization
+- **API Documentation**: Auto-generated Swagger/OpenAPI docs
+## Architecture
+```
+┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
+│   Streamlit     │    │   FastAPI       │    │   CLIP Model    │
+│   Frontend      │◄──►│   Backend       │◄──►│   (Encoder)     │
+│   (Port 8501)   │    │   (Port 8000)   │    │                 │
+└─────────────────┘    └─────────────────┘    └─────────────────┘
+                              │
+                              ▼
+                       ┌─────────────────┐
+                       │   Image         │
+                       │   Database      │
+                       │   (images/)     │
+                       └─────────────────┘
+```
+## Technical Details
+- **Image & Text Features**: CLIP (ViT-B-32, openai weights)
+- **Search Algorithm**: Cosine similarity in CLIP embedding space
+- **Embedding Dimension**: 512 (default for ViT-B-32)
+- **Performance**: Fast, GPU-accelerated if available
+- **Dependencies**: `open-clip-torch`, `torch`, `fastapi`, `streamlit`, `Pillow`, `numpy`
+## Quick Start
+### 1. Install Dependencies
+```bash
+pip install -r requirements.txt
+```
+### 2. Prepare Your Images
+The application will automatically check and download images for you:
+```bash
+# Check image status
+python start_app.py --check-only
+# Download missing images (if any)
+python start_app.py --download-images
+# Or let the app handle everything automatically
+python start_app.py
+```
+**Automatic Features:**
+- ✅ Checks if images are already downloaded
+- ✅ Downloads missing images in parallel (up to 20 workers)
+- ✅ Skips existing images to save time
+- ✅ Shows progress and statistics
+- ✅ Handles errors gracefully
+- ✅ **Smart startup**: Allows application to start with sufficient images (default: 1000+)
+- ✅ **Flexible requirements**: Can adjust minimum image requirements
+- ✅ **Graceful degradation**: Works even with partial image database
+### 3. Start the Application
+#### Option A: Start Both Services (Recommended)
+```bash
+python start_app.py
+```
+This will:
+1. Check dependencies
+2. Verify/download images automatically
+3. Start both the FastAPI backend and Streamlit frontend
+#### Command Line Options
+```bash
+# Basic startup (recommended)
+python start_app.py
+# Skip automatic image download
+python start_app.py --skip-download
+# Force download images even if some exist
+python start_app.py --download-images
+# Use custom number of parallel workers
+python start_app.py --max-workers 10
+# Use custom images directory
+python start_app.py --images-dir my_images
+# Set custom minimum image requirement
+python start_app.py --min-images 500
+# Only check image status, don't start services
+python start_app.py --check-only
+```
+#### Option B: Start Services Separately
+**Start FastAPI Backend:**
+```bash
+cd api && python main.py
+```
+**Start Streamlit Frontend (in another terminal):**
+```bash
+streamlit run streamlit_app.py
+```
+#### Option C: Download Images Only
+```bash
+# Download all images
+python download_images.py
+# Download with custom settings
+python download_images.py --max-workers 15 --output-dir images
+# Check status only
+python download_images.py --check-only
+```
+### 4. Access the Application
+- **Web Interface**: http://localhost:8501
+- **API Documentation**: http://localhost:8000/docs
+- **API Health Check**: http://localhost:8000/api/health
+## API Endpoints
+### Core Endpoints
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| `/` | GET | Root endpoint with API info |
+| `/api/health` | GET | Health check |
+| `/api/info` | GET | System information |
+| `/api/images` | GET | List available images |
+### Search Endpoints
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| `/api/search/text` | POST | Text-based search |
+| `/api/search/image` | POST | Image-based search |
+| `/api/search/image/upload` | POST | Upload image for search |
+### Example API Usage
+#### Text Search
+```bash
+curl -X POST "http://localhost:8000/api/search/text" \
+     -H "Content-Type: application/json" \
+     -d '{"query": "red car", "top_k": 5}'
+```
+#### Image Search
+```bash
+curl -X POST "http://localhost:8000/api/search/image" \
+     -H "Content-Type: application/json" \
+     -d '{"image_path": "images/0001.jpg", "top_k": 5}'
+```
+#### Image Upload Search
+```bash
+curl -X POST "http://localhost:8000/api/search/image/upload" \
+     -F "file=@your_image.jpg" \
+     -F "top_k=5"
+```
+## Web Interface Features
+### Text Search
+- Enter natural language descriptions
+- Examples: "a cat sitting", "red car", "person walking", "building with windows"
+- Returns images that best match the text description
+### Image Search
+- Select an image from your dataset
+- Finds images visually similar to the selected image
+- Excludes the query image from results
+### Image Upload Search
+- Upload any image file (JPG, PNG, BMP, TIFF)
+- Find similar images in your dataset
+- Real-time processing and results
+### Results Display
+- **Ranking**: 1-5 (top 5 results)
+- **Filename**: Name of the image file
+- **Path**: Full path to the image
+- **Similarity Score**: Percentage indicating how well the image matches the query
+- **Image Preview**: Visual display of results
+## Testing
+### Test the API
+```bash
+python test_api.py
+```
+### Test the CLI (Legacy)
+```bash
+python test_search.py
+```
+### Run Demo
+```bash
+python demo_search.py
+```
+## Development
+### Project Structure
+```
+visual-search-system/
+├── api/
+│   └── main.py              # FastAPI backend
+├── src/
+│   └── models/
+│       └── multimodal_encoder.py  # CLIP encoder
+├── images/                   # Image database
+├── models/
+│   └── clip_index/          # Precomputed embeddings
+├── streamlit_app.py         # Streamlit frontend
+├── start_app.py             # Startup script
+├── test_api.py              # API tests
+├── run_search.py            # CLI interface (legacy)
+└── requirements.txt         # Dependencies
+```
+### Adding New Features
+1. **New API Endpoints**: Add to `api/main.py`
+2. **New UI Components**: Add to `streamlit_app.py`
+3. **New Search Methods**: Extend `src/models/multimodal_encoder.py`
+## Performance
+- **Precomputed Embeddings**: Uses existing embeddings in `models/clip_index/` for faster search
+- **Real-time Fallback**: Falls back to real-time encoding if precomputed embeddings aren't available
+- **GPU Acceleration**: Automatically uses GPU if available for faster processing
+- **Memory Efficient**: Processes images in batches to manage memory usage
+- **Async Processing**: FastAPI handles concurrent requests efficiently
+## Deployment
+### Hugging Face Spaces (Recommended)
+This application is designed to work seamlessly on Hugging Face Spaces:
+1. **Fork this repository** to your Hugging Face account
+2. **Create a new Space** on Hugging Face Spaces
+3. **Connect your forked repository** to the Space
+4. **The application will automatically**:
+   - Check for downloaded images
+   - Download missing images in parallel
+   - Start the Streamlit interface
+   - Handle all dependencies automatically
+**Features for Hugging Face Spaces:**
+- ✅ Automatic image downloading with progress tracking
+- ✅ Optimized for Spaces environment (reduced parallel workers)
+- ✅ Built-in error handling and recovery
+- ✅ No manual setup required
+- ✅ Works with the provided `photos_url.csv` dataset
+**Space Configuration:**
+- **SDK**: Streamlit
+- **Hardware**: CPU Basic (free tier) or GPU for faster processing
+- **App File**: `app.py`
+### Docker Deployment
+Create a `Dockerfile`:
+```Dockerfile
+FROM python:3.9-slim
+WORKDIR /app
+# Install system dependencies
+RUN apt-get update && apt-get install -y \
+    libgl1-mesa-glx \
+    libglib2.0-0 \
+    && rm -rf /var/lib/apt/lists/*
+# Copy requirements and install Python dependencies
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+# Copy application code
+COPY . .
+# Expose ports
+EXPOSE 8000 8501
+# Start the application
+CMD ["python", "start_app.py"]
+```
+Build and run:
+```bash
+docker build -t visual-search .
+docker run -p 8000:8000 -p 8501:8501 -v $(pwd)/images:/app/images visual-search
+```
+### Production Deployment
+For production, consider:
+- Using a production ASGI server like Gunicorn
+- Setting up proper CORS configuration
+- Implementing authentication
+- Using a reverse proxy (nginx)
+- Setting up monitoring and logging
+## Troubleshooting
+### Common Issues
+- **API Connection Failed**: Make sure the FastAPI server is running on port 8000
+- **No images found**: Ensure images are in the `images/` directory with `.jpg` extension
+- **Model loading errors**: Check that all dependencies are installed correctly
+- **GPU issues**: Install appropriate CUDA version for your GPU
+- **Memory errors**: Reduce batch size or use CPU-only mode
+- **Port conflicts**: Change ports in `start_app.py` if needed
+### Debug Mode
+Run with debug information:
+```bash
+# API with debug logging
+cd api && uvicorn main:app --reload --log-level debug
+# Streamlit with debug
+streamlit run streamlit_app.py --logger.level debug
+```
+## License
+MIT

download_images.py ADDED Viewed

	@@ -0,0 +1,220 @@

+"""
+Image Downloader for Photo Dataset
+--------------------------------
+This script downloads and optimizes images from URLs in a photos.csv file with parallel processing.
+Requirements:
+    pip install pandas pillow requests tqdm concurrent.futures
+Usage:
+    1. Ensure photos_url.csv is in the same directory as this script
+    2. Run the script: python download_images.py
+    3. Images will be downloaded to the 'images' folder
+Note:
+    - Default image size is 800x800 pixels (maintains aspect ratio)
+    - Images are saved as optimized JPEGs
+    - You can modify num_images parameter to download fewer images
+    - Approximate size of the dataset is 1.5GB and total images are 25,000 images
+    - Uses parallel downloading for maximum efficiency
+"""
+import pandas as pd
+import requests
+import os
+from PIL import Image
+from io import BytesIO
+from tqdm import tqdm
+import concurrent.futures
+from concurrent.futures import ThreadPoolExecutor, as_completed
+import time
+from pathlib import Path
+def download_single_image(args):
+    """
+    Download a single image with error handling
+    Args:
+        args: Tuple of (idx, url, output_path, target_size)
+    Returns:
+        Tuple of (success, idx, error_message)
+    """
+    idx, url, output_path, target_size = args
+    try:
+        # Skip if file already exists
+        if os.path.exists(output_path):
+            return True, idx, "Already exists"
+        response = requests.get(url, timeout=15, stream=True)
+        if response.status_code == 200:
+            # Process image
+            img = Image.open(BytesIO(response.content))
+            if img.mode in ('RGBA', 'P'):
+                img = img.convert('RGB')
+            img.thumbnail(target_size, Image.Resampling.LANCZOS)
+            img.save(output_path, 'JPEG', quality=85, optimize=True)
+            return True, idx, None
+        else:
+            return False, idx, f"HTTP {response.status_code}"
+    except Exception as e:
+        return False, idx, str(e)
+def check_images_downloaded(output_dir="images", expected_count=None):
+    """
+    Check if images are already downloaded
+    Args:
+        output_dir: Directory to check for images
+        expected_count: Expected number of images (optional)
+    Returns:
+        Tuple of (is_complete, current_count, missing_count)
+    """
+    images_dir = Path(output_dir)
+    if not images_dir.exists():
+        return False, 0, expected_count or 0
+    # Count existing images
+    existing_images = list(images_dir.glob("*.jpg"))
+    current_count = len(existing_images)
+    if expected_count is None:
+        # Try to get expected count from CSV
+        try:
+            df = pd.read_csv("photos_url.csv")
+            expected_count = len(df)
+        except:
+            expected_count = current_count
+    missing_count = max(0, expected_count - current_count)
+    is_complete = missing_count == 0
+    return is_complete, current_count, missing_count
+def download_images(num_images=None, output_dir="images", target_size=(800, 800), max_workers=20):
+    """
+    Download and optimize images from photos.csv with parallel processing
+    Args:
+        num_images: Number of images to download (default: all images in CSV)
+        output_dir: Directory to save images (default: 'images')
+        target_size: Max image dimensions (default: (800, 800))
+        max_workers: Maximum number of parallel download threads (default: 20)
+    """
+    # Create output directory
+    os.makedirs(output_dir, exist_ok=True)
+    # Read CSV and prepare dataset
+    df = pd.read_csv("photos_url.csv")
+    if num_images:
+        df = df.head(num_images)
+    total_images = len(df)
+    print(f"📊 Total images to process: {total_images:,}")
+    # Check existing images
+    is_complete, current_count, missing_count = check_images_downloaded(output_dir, total_images)
+    if is_complete:
+        print(f"✅ All {current_count:,} images are already downloaded!")
+        return True
+    print(f"📥 Found {current_count:,} existing images, need to download {missing_count:,} more")
+    # Prepare download tasks - only for missing images
+    download_tasks = []
+    for idx, row in df.iterrows():
+        filename = f"{(idx+1):04d}.jpg"
+        output_path = os.path.join(output_dir, filename)
+        # Only add to download tasks if file doesn't exist
+        if not os.path.exists(output_path):
+            download_tasks.append((idx, row['photo_image_url'], output_path, target_size))
+    if not download_tasks:
+        print("✅ All images are already downloaded!")
+        return True
+    print(f"🚀 Starting parallel download of {len(download_tasks):,} missing images with {max_workers} workers...")
+    start_time = time.time()
+    successful_downloads = 0
+    failed_downloads = 0
+    skipped_downloads = 0
+    with ThreadPoolExecutor(max_workers=max_workers) as executor:
+        # Submit all tasks
+        future_to_task = {executor.submit(download_single_image, task): task for task in download_tasks}
+        # Process completed tasks with progress bar
+        with tqdm(total=len(download_tasks), desc="Downloading images") as pbar:
+            for future in as_completed(future_to_task):
+                success, idx, error = future.result()
+                if success:
+                    if error == "Already exists":
+                        skipped_downloads += 1
+                    else:
+                        successful_downloads += 1
+                else:
+                    failed_downloads += 1
+                    if error and error != "Already exists":
+                        # Only show error for non-404 errors to reduce noise
+                        if "404" not in str(error) and "NameResolutionError" not in str(error):
+                            print(f"❌ Failed to download image {idx+1}: {error}")
+                pbar.update(1)
+    end_time = time.time()
+    duration = end_time - start_time
+    print(f"\n📈 Download Summary:")
+    print(f"   ✅ New downloads: {successful_downloads:,}")
+    print(f"   ⏭️  Skipped (already exist): {skipped_downloads:,}")
+    print(f"   ❌ Failed: {failed_downloads:,}")
+    print(f"   ⏱️  Duration: {duration:.1f} seconds")
+    if duration > 0:
+        print(f"   🚀 Speed: {successful_downloads/duration:.1f} images/second")
+    # Final check
+    is_complete, final_count, final_missing = check_images_downloaded(output_dir, total_images)
+    if final_count >= total_images * 0.95:  # Consider successful if we have 95% or more
+        print(f"🎉 Download completed! Now have {final_count:,} images ({final_missing:,} missing)")
+        return True
+    elif final_count > current_count:
+        print(f"✅ Download partially successful! Now have {final_count:,} images ({final_missing:,} missing)")
+        return True
+    else:
+        print(f"⚠️  Download had issues. Still have {final_count:,} images ({final_missing:,} missing)")
+        return False
+if __name__ == "__main__":
+    import argparse
+    parser = argparse.ArgumentParser(description="Download images with parallel processing")
+    parser.add_argument("--num-images", type=int, default=None, help="Number of images to download (default: all)")
+    parser.add_argument("--output-dir", type=str, default="images", help="Output directory (default: images)")
+    parser.add_argument("--max-workers", type=int, default=20, help="Maximum parallel workers (default: 20)")
+    parser.add_argument("--check-only", action="store_true", help="Only check if images are downloaded")
+    args = parser.parse_args()
+    if args.check_only:
+        # Just check the status
+        is_complete, current_count, missing_count = check_images_downloaded(args.output_dir)
+        if is_complete:
+            print(f"✅ All {current_count:,} images are downloaded!")
+        else:
+            print(f"📊 Status: {current_count:,} downloaded, {missing_count:,} missing")
+    else:
+        # Download images
+        success = download_images(
+            num_images=args.num_images,
+            output_dir=args.output_dir,
+            max_workers=args.max_workers
+        )
+        exit(0 if success else 1)

photos_url.csv ADDED Viewed

The diff for this file is too large to render. See raw diff

requirements.txt ADDED Viewed

	@@ -0,0 +1,5 @@

+streamlit>=1.28.0
+pandas>=1.5.0
+requests>=2.28.0
+Pillow>=9.0.0
+tqdm>=4.64.0

start_app.py ADDED Viewed

	@@ -0,0 +1,398 @@

+#!/usr/bin/env python3
+"""
+Visual Search System - Complete Streamlit App
+============================================
+A comprehensive Streamlit application that:
+1. Automatically installs required dependencies
+2. Downloads images from photos_url.csv if needed
+3. Provides a clean UI for searching and viewing images
+4. Supports both search by ID and range by block functionality
+Requirements:
+- photos_url.csv: Contains image URLs
+- download_images.py: Contains parallel downloading logic
+- images/ folder: Will be created and populated with downloaded images
+Usage:
+    streamlit run start_app.py
+"""
+import os
+import sys
+import subprocess
+import importlib
+from pathlib import Path
+import pandas as pd
+import streamlit as st
+from typing import List, Tuple, Optional
+import time
+# Configuration
+REQUIRED_PACKAGES = [
+    "streamlit",
+    "pandas",
+    "requests",
+    "PIL",
+    "tqdm"
+]
+IMAGES_DIR = "images"
+CSV_FILE = "photos_url.csv"
+DOWNLOAD_SCRIPT = "download_images.py"
+MAX_DISPLAY_IMAGES = 500
+IMAGES_PER_BLOCK = 100
+TOTAL_BLOCKS = 250
+def install_package(package: str) -> bool:
+    """
+    Install a Python package using pip
+    Args:
+        package: Package name to install
+    Returns:
+        True if successful, False otherwise
+    """
+    try:
+        subprocess.check_call([sys.executable, "-m", "pip", "install", package])
+        return True
+    except subprocess.CalledProcessError:
+        return False
+def check_and_install_dependencies() -> bool:
+    """
+    Check if required packages are installed, install if missing
+    Returns:
+        True if all dependencies are available, False otherwise
+    """
+    print("🔍 Checking dependencies...")
+    missing_packages = []
+    for package in REQUIRED_PACKAGES:
+        try:
+            importlib.import_module(package)
+            print(f"✅ {package} is already installed")
+        except ImportError:
+            print(f"📦 Installing {package}...")
+            missing_packages.append(package)
+    if missing_packages:
+        print(f"🚀 Installing {len(missing_packages)} missing packages...")
+        for package in missing_packages:
+            print(f"📥 Installing {package}...")
+            if install_package(package):
+                print(f"✅ Successfully installed {package}")
+            else:
+                print(f"❌ Failed to install {package}")
+                return False
+        # Verify installations
+        for package in missing_packages:
+            try:
+                importlib.import_module(package)
+                print(f"✅ {package} verified after installation")
+            except ImportError:
+                print(f"❌ {package} still not available after installation")
+                return False
+    print("✅ All dependencies are available!")
+    return True
+def check_images_status() -> Tuple[bool, int, int]:
+    """
+    Check the status of downloaded images
+    Returns:
+        Tuple of (is_complete, current_count, total_count)
+    """
+    images_path = Path(IMAGES_DIR)
+    if not images_path.exists():
+        return False, 0, 0
+    # Count existing images
+    existing_images = list(images_path.glob("*.jpg"))
+    current_count = len(existing_images)
+    # Get total count from CSV
+    try:
+        df = pd.read_csv(CSV_FILE)
+        total_count = len(df)
+    except Exception as e:
+        print(f"❌ Error reading {CSV_FILE}: {e}")
+        return False, current_count, 0
+    is_complete = current_count >= total_count * 0.95  # Consider complete if 95%+ downloaded
+    return is_complete, current_count, total_count
+def download_images_if_needed() -> bool:
+    """
+    Download images if they're missing or incomplete
+    Returns:
+        True if images are available, False otherwise
+    """
+    print("🔍 Checking image status...")
+    is_complete, current_count, total_count = check_images_status()
+    if is_complete:
+        print(f"✅ Images are ready! Have {current_count:,} of {total_count:,} images")
+        return True
+    print(f"📥 Images incomplete: {current_count:,} of {total_count:,} available")
+    print("🚀 Starting image download...")
+    try:
+        # Import download functions from download_images.py
+        sys.path.append('.')
+        from download_images import download_images
+        success = download_images(
+            num_images=None,  # Download all images
+            output_dir=IMAGES_DIR,
+            max_workers=20
+        )
+        if success:
+            print("✅ Image download completed successfully!")
+            return True
+        else:
+            print("⚠️ Image download had some issues, but continuing...")
+            return True
+    except Exception as e:
+        print(f"❌ Error during image download: {e}")
+        return False
+def get_image_path(image_id: str) -> Optional[str]:
+    """
+    Get the file path for a given image ID
+    Args:
+        image_id: Image ID (e.g., "0001", "1234")
+    Returns:
+        File path if exists, None otherwise
+    """
+    try:
+        # Convert image ID to filename format
+        if image_id.isdigit():
+            filename = f"{int(image_id):04d}.jpg"
+        else:
+            filename = f"{image_id}.jpg"
+        image_path = os.path.join(IMAGES_DIR, filename)
+        if os.path.exists(image_path):
+            return image_path
+        else:
+            return None
+    except:
+        return None
+def get_block_images(block_number: int) -> List[str]:
+    """
+    Get all images for a specific block
+    Args:
+        block_number: Block number (1-250)
+    Returns:
+        List of image paths for the block
+    """
+    if not (1 <= block_number <= TOTAL_BLOCKS):
+        return []
+    # Calculate start and end image numbers for this block
+    start_num = (block_number - 1) * IMAGES_PER_BLOCK + 1
+    end_num = block_number * IMAGES_PER_BLOCK
+    image_paths = []
+    for i in range(start_num, end_num + 1):
+        image_path = get_image_path(str(i))
+        if image_path:
+            image_paths.append(image_path)
+    return image_paths
+def search_images_by_id(search_id: str) -> List[str]:
+    """
+    Search for images by ID
+    Args:
+        search_id: Search term (can be partial)
+    Returns:
+        List of matching image paths
+    """
+    if not search_id.strip():
+        # Return first 500 images if no search term
+        return [get_image_path(str(i)) for i in range(1, MAX_DISPLAY_IMAGES + 1)
+                if get_image_path(str(i))]
+    # Search for exact or partial matches
+    matching_paths = []
+    # Try exact match first
+    exact_path = get_image_path(search_id)
+    if exact_path:
+        matching_paths.append(exact_path)
+    # Search for partial matches
+    for i in range(1, 25001):  # Total images in dataset
+        image_path = get_image_path(str(i))
+        if image_path and search_id.lower() in str(i):
+            if image_path not in matching_paths:
+                matching_paths.append(image_path)
+                if len(matching_paths) >= MAX_DISPLAY_IMAGES:
+                    break
+    return matching_paths
+def display_image_grid(image_paths: List[str], title: str):
+    """
+    Display a grid of images using Streamlit
+    Args:
+        image_paths: List of image file paths
+        title: Title for the image grid
+    """
+    if not image_paths:
+        st.warning("No images found matching your criteria.")
+        return
+    st.subheader(f"{title} ({len(image_paths)} images)")
+    # Create columns for the grid (3 columns)
+    cols = st.columns(3)
+    for idx, image_path in enumerate(image_paths):
+        col_idx = idx % 3
+        with cols[col_idx]:
+            try:
+                st.image(image_path, caption=f"Image {os.path.basename(image_path)}", use_column_width=True)
+            except Exception as e:
+                st.error(f"Error loading image: {e}")
+def main():
+    """Main Streamlit application"""
+    # Page configuration
+    st.set_page_config(
+        page_title="Visual Search System",
+        page_icon="🔍",
+        layout="wide",
+        initial_sidebar_state="expanded"
+    )
+    # Main title
+    st.title("🔍 Visual Search System")
+    st.markdown("---")
+    # Sidebar for navigation
+    st.sidebar.header("Navigation")
+    search_option = st.sidebar.selectbox(
+        "Choose search method:",
+        ["Search by ID", "Range by Block"]
+    )
+    # Main content area
+    if search_option == "Search by ID":
+        st.header("🔎 Search Images by ID")
+        # Search input
+        search_id = st.text_input(
+            "Enter image ID (e.g., '0001', '1234') or leave empty to see first 500 images:",
+            placeholder="Enter ID or leave empty",
+            help="Enter a specific image ID or leave empty to browse the first 500 images"
+        )
+        # Search button
+        if st.button("🔍 Search", type="primary") or search_id != "":
+            with st.spinner("Searching images..."):
+                matching_images = search_images_by_id(search_id)
+                if matching_images:
+                    display_image_grid(
+                        matching_images,
+                        f"Showing {len(matching_images)} matching images"
+                    )
+                else:
+                    st.info("No images found matching your search criteria.")
+    else:  # Range by Block
+        st.header("📦 Browse Images by Block")
+        st.markdown(f"""
+        **How it works:**
+        - Each block contains **{IMAGES_PER_BLOCK} images**
+        - Enter a number between **1 and {TOTAL_BLOCKS}**
+        - Example: Enter **100** to see images **10001-10100**
+        """)
+        # Block input
+        block_number = st.number_input(
+            f"Enter block number (1-{TOTAL_BLOCKS}):",
+            min_value=1,
+            max_value=TOTAL_BLOCKS,
+            value=1,
+            step=1,
+            help=f"Choose a block number from 1 to {TOTAL_BLOCKS}"
+        )
+        # Calculate and display block info
+        start_num = (block_number - 1) * IMAGES_PER_BLOCK + 1
+        end_num = block_number * IMAGES_PER_BLOCK
+        st.info(f"**Block {block_number}**: Images {start_num:,} to {end_num:,}")
+        # Get block images
+        with st.spinner(f"Loading block {block_number}..."):
+            block_images = get_block_images(block_number)
+            if block_images:
+                display_image_grid(
+                    block_images,
+                    f"Block {block_number} - Images {start_num:,} to {end_num:,}"
+                )
+            else:
+                st.warning(f"No images found for block {block_number}.")
+    # Footer
+    st.markdown("---")
+    st.markdown(
+        "**Dataset Info:** 25,000+ high-quality images from Unsplash | "
+        "Built with Streamlit and Python"
+    )
+def setup_and_run():
+    """Setup dependencies and run the app"""
+    print("🚀 Starting Visual Search System...")
+    # Step 1: Install dependencies
+    if not check_and_install_dependencies():
+        print("❌ Failed to install dependencies. Exiting.")
+        sys.exit(1)
+    print("✅ Dependencies ready!")
+    # Step 2: Check and download images
+    if not download_images_if_needed():
+        print("❌ Failed to prepare images. Exiting.")
+        sys.exit(1)
+    print("✅ Images ready!")
+    # Step 3: Launch Streamlit app
+    print("🚀 Launching Streamlit app...")
+    main()
+if __name__ == "__main__":
+    setup_and_run()