Spaces:

LeooNic
/

food-101-classifier

Sleeping

LeooNic commited on 14 days ago

Commit

b29fbac

1 Parent(s): b6c74eb

Deploy Food-101 classifier with 84.49% accuracy

- EfficientNet-B0 model achieving 84.49% test accuracy
- ONNX optimized for 7ms inference time
- Interactive Gradio interface with example images
- Supports 101 food classes from Food-101 dataset
- Ready for production deployment

Files changed (25) hide show

README.md +114 -6
app.py +41 -0
food-101/food-101/images/hamburger/100057.jpg +0 -0
food-101/food-101/images/ice_cream/1004744.jpg +0 -0
food-101/food-101/images/pizza/1001116.jpg +0 -0
food-101/food-101/meta/classes.txt +101 -0
food-101/food-101/meta/labels.txt +101 -0
food-101/food-101/meta/test.json +0 -0
food-101/food-101/meta/test.txt +0 -0
food-101/food-101/meta/train.json +0 -0
food-101/food-101/meta/train.txt +0 -0
gradio_app/app.py +225 -0
models/efficientnet_b0_food101.onnx +3 -0
requirements.txt +23 -0
scripts/__pycache__/check_gpu.cpython-312.pyc +0 -0
scripts/__pycache__/evaluate.cpython-312.pyc +0 -0
scripts/__pycache__/gradcam.cpython-312.pyc +0 -0
scripts/__pycache__/predict.cpython-312.pyc +0 -0
scripts/__pycache__/train.cpython-312.pyc +0 -0
scripts/check_gpu.py +21 -0
scripts/evaluate.py +136 -0
scripts/export_model.py +159 -0
scripts/gradcam.py +143 -0
scripts/predict.py +175 -0
scripts/train.py +590 -0

README.md CHANGED Viewed

@@ -1,14 +1,122 @@
 ---
-title: Food 101 Classifier
-emoji: 👁
-colorFrom: red
 colorTo: green
 sdk: gradio
-sdk_version: 5.46.1
 app_file: app.py
 pinned: false
 license: mit
-short_description: My Space
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Food-101 AI Classifier
+emoji: 🍔
+colorFrom: blue
 colorTo: green
 sdk: gradio
+sdk_version: 4.44.0
 app_file: app.py
 pinned: false
 license: mit
 ---
+# 🍔 Food-101 AI Classifier
+An AI-powered food image classifier trained on the Food-101 dataset, capable of recognizing 101 different types of food with high accuracy.
+## 🎯 Model Performance
+- **Architecture**: EfficientNet-B0 (fine-tuned)
+- **Test Accuracy**: 84.49%
+- **Top-5 Accuracy**: 96.72%
+- **Inference Speed**: ~7ms per image
+- **Model Size**: 15.77 MB (ONNX optimized)
+## 🚀 Features
+- **Fast Inference**: ONNX-optimized model for lightning-fast predictions
+- **High Accuracy**: State-of-the-art performance on Food-101 dataset
+- **User-Friendly Interface**: Clean and intuitive Gradio web interface
+- **Real-time Predictions**: Upload any food image and get instant results
+- **Confidence Scores**: See how confident the model is about each prediction
+## 📊 Dataset
+This model was trained on the [Food-101 dataset](https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/), which contains:
+- **101 food categories**
+- **1,000 images per category**
+- **Total**: 101,000 images
+## 🏆 Recognized Food Categories
+The model can classify the following 101 food types:
+```
+apple_pie, baby_back_ribs, baklava, beef_carpaccio, beef_tartare, beet_salad,
+beignets, bibimbap, bread_pudding, breakfast_burrito, bruschetta, caesar_salad,
+cannoli, caprese_salad, carrot_cake, ceviche, cheese_plate, cheesecake,
+chicken_curry, chicken_quesadilla, chicken_wings, chocolate_cake, chocolate_mousse,
+churros, clam_chowder, club_sandwich, crab_cakes, creme_brulee, croque_madame,
+cup_cakes, deviled_eggs, donuts, dumplings, edamame, eggs_benedict, escargots,
+falafel, filet_mignon, fish_and_chips, foie_gras, french_fries, french_onion_soup,
+french_toast, fried_calamari, fried_rice, frozen_yogurt, garlic_bread, gnocchi,
+greek_salad, grilled_cheese_sandwich, grilled_salmon, guacamole, gyoza, hamburger,
+hot_and_sour_soup, hot_dog, huevos_rancheros, hummus, ice_cream, lasagna,
+lobster_bisque, lobster_roll_sandwich, macaroni_and_cheese, macarons, miso_soup,
+mussels, nachos, omelette, onion_rings, oysters, pad_thai, paella, pancakes,
+panna_cotta, peking_duck, pho, pizza, pork_chop, poutine, prime_rib, pulled_pork_sandwich,
+ramen, ravioli, red_velvet_cake, risotto, samosa, sashimi, scallops, seaweed_salad,
+shrimp_and_grits, spaghetti_bolognese, spaghetti_carbonara, spring_rolls, steak,
+strawberry_shortcake, sushi, tacos, takoyaki, tiramisu, tuna_tartare, waffles
+```
+## 🛠️ Technical Details
+### Architecture
+- **Base Model**: EfficientNet-B0 (pre-trained on ImageNet)
+- **Fine-tuning**: Transfer learning with Food-101 dataset
+- **Optimization**: ONNX Runtime for fast inference
+- **Input Size**: 224×224×3 RGB images
+### Training Pipeline
+1. **Data Augmentation**: Albumentations library for robust training
+2. **Transfer Learning**: Fine-tuned pre-trained EfficientNet-B0
+3. **Advanced Training**: Early stopping, gradient clipping, AMP
+4. **Validation**: 10% held-out validation set for model selection
+### Deployment Stack
+- **Model Format**: ONNX (optimized for inference)
+- **Backend**: Python with ONNX Runtime
+- **Frontend**: Gradio web interface
+- **Hosting**: Hugging Face Spaces
+## 📝 How to Use
+1. **Upload an Image**: Click on the upload area or drag & drop a food image
+2. **Set Predictions**: Choose how many top predictions you want (1-10)
+3. **Get Results**: Click "Submit" to see predictions with confidence scores
+4. **Try Examples**: Use the provided example images to test the model
+## 🧪 Model Evaluation
+### Performance Metrics
+- **Accuracy**: 84.49%
+- **Macro F1-Score**: 84.40%
+- **Weighted F1-Score**: 84.40%
+- **Top-5 Accuracy**: 96.72%
+### Most Challenging Classes
+The model struggles most with:
+1. Steak (51.35% F1-score)
+2. Apple Pie (63.36% F1-score)
+3. Pork Chop (66.02% F1-score)
+## 🔬 Explainability
+The model includes Grad-CAM visualization capabilities to show which parts of the image the AI focuses on when making predictions, providing transparency into the decision-making process.
+## 📜 License
+This project is licensed under the MIT License - see the LICENSE file for details.
+## 🙏 Acknowledgments
+- **Dataset**: Food-101 dataset by Bossard et al.
+- **Framework**: PyTorch and torchvision
+- **Optimization**: ONNX Runtime
+- **Interface**: Gradio
+- **Hosting**: Hugging Face Spaces
+---
+Built with ❤️ using PyTorch, ONNX, and Gradio

app.py ADDED Viewed

	@@ -0,0 +1,41 @@

+"""Hugging Face Spaces deployment app for Food-101 classification."""
+# This is the main app file expected by Hugging Face Spaces
+# It imports and runs the Gradio app from gradio_app/app.py
+import sys
+from pathlib import Path
+# Add current directory to path
+sys.path.append(str(Path(__file__).parent))
+# Import the Gradio app
+from gradio_app.app import GradioFood101App
+def main():
+    """Main function for Hugging Face Spaces deployment."""
+    try:
+        # Initialize the app
+        print("[HF SPACES] Initializing Food-101 Classifier App...")
+        app = GradioFood101App()
+        # Create interface
+        print("[HF SPACES] Creating Gradio interface...")
+        interface = app.create_interface()
+        # Launch the app for HF Spaces
+        print("[HF SPACES] Launching app for Hugging Face Spaces...")
+        interface.launch(
+            share=False,
+            server_name="0.0.0.0",
+            server_port=7860,
+            show_error=True,
+            enable_queue=True
+        )
+    except Exception as e:
+        print(f"[ERROR] Failed to launch HF Spaces app: {e}")
+        raise
+if __name__ == "__main__":
+    main()

food-101/food-101/images/hamburger/100057.jpg ADDED Viewed

food-101/food-101/images/ice_cream/1004744.jpg ADDED Viewed

food-101/food-101/images/pizza/1001116.jpg ADDED Viewed

food-101/food-101/meta/classes.txt ADDED Viewed

	@@ -0,0 +1,101 @@

+apple_pie
+baby_back_ribs
+baklava
+beef_carpaccio
+beef_tartare
+beet_salad
+beignets
+bibimbap
+bread_pudding
+breakfast_burrito
+bruschetta
+caesar_salad
+cannoli
+caprese_salad
+carrot_cake
+ceviche
+cheesecake
+cheese_plate
+chicken_curry
+chicken_quesadilla
+chicken_wings
+chocolate_cake
+chocolate_mousse
+churros
+clam_chowder
+club_sandwich
+crab_cakes
+creme_brulee
+croque_madame
+cup_cakes
+deviled_eggs
+donuts
+dumplings
+edamame
+eggs_benedict
+escargots
+falafel
+filet_mignon
+fish_and_chips
+foie_gras
+french_fries
+french_onion_soup
+french_toast
+fried_calamari
+fried_rice
+frozen_yogurt
+garlic_bread
+gnocchi
+greek_salad
+grilled_cheese_sandwich
+grilled_salmon
+guacamole
+gyoza
+hamburger
+hot_and_sour_soup
+hot_dog
+huevos_rancheros
+hummus
+ice_cream
+lasagna
+lobster_bisque
+lobster_roll_sandwich
+macaroni_and_cheese
+macarons
+miso_soup
+mussels
+nachos
+omelette
+onion_rings
+oysters
+pad_thai
+paella
+pancakes
+panna_cotta
+peking_duck
+pho
+pizza
+pork_chop
+poutine
+prime_rib
+pulled_pork_sandwich
+ramen
+ravioli
+red_velvet_cake
+risotto
+samosa
+sashimi
+scallops
+seaweed_salad
+shrimp_and_grits
+spaghetti_bolognese
+spaghetti_carbonara
+spring_rolls
+steak
+strawberry_shortcake
+sushi
+tacos
+takoyaki
+tiramisu
+tuna_tartare
+waffles

food-101/food-101/meta/labels.txt ADDED Viewed

	@@ -0,0 +1,101 @@

+Apple pie
+Baby back ribs
+Baklava
+Beef carpaccio
+Beef tartare
+Beet salad
+Beignets
+Bibimbap
+Bread pudding
+Breakfast burrito
+Bruschetta
+Caesar salad
+Cannoli
+Caprese salad
+Carrot cake
+Ceviche
+Cheesecake
+Cheese plate
+Chicken curry
+Chicken quesadilla
+Chicken wings
+Chocolate cake
+Chocolate mousse
+Churros
+Clam chowder
+Club sandwich
+Crab cakes
+Creme brulee
+Croque madame
+Cup cakes
+Deviled eggs
+Donuts
+Dumplings
+Edamame
+Eggs benedict
+Escargots
+Falafel
+Filet mignon
+Fish and chips
+Foie gras
+French fries
+French onion soup
+French toast
+Fried calamari
+Fried rice
+Frozen yogurt
+Garlic bread
+Gnocchi
+Greek salad
+Grilled cheese sandwich
+Grilled salmon
+Guacamole
+Gyoza
+Hamburger
+Hot and sour soup
+Hot dog
+Huevos rancheros
+Hummus
+Ice cream
+Lasagna
+Lobster bisque
+Lobster roll sandwich
+Macaroni and cheese
+Macarons
+Miso soup
+Mussels
+Nachos
+Omelette
+Onion rings
+Oysters
+Pad thai
+Paella
+Pancakes
+Panna cotta
+Peking duck
+Pho
+Pizza
+Pork chop
+Poutine
+Prime rib
+Pulled pork sandwich
+Ramen
+Ravioli
+Red velvet cake
+Risotto
+Samosa
+Sashimi
+Scallops
+Seaweed salad
+Shrimp and grits
+Spaghetti bolognese
+Spaghetti carbonara
+Spring rolls
+Steak
+Strawberry shortcake
+Sushi
+Tacos
+Takoyaki
+Tiramisu
+Tuna tartare
+Waffles

food-101/food-101/meta/test.json ADDED Viewed

The diff for this file is too large to render. See raw diff

food-101/food-101/meta/test.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

food-101/food-101/meta/train.json ADDED Viewed

The diff for this file is too large to render. See raw diff

food-101/food-101/meta/train.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

gradio_app/app.py ADDED Viewed

	@@ -0,0 +1,225 @@

+"""Gradio demo app for Food-101 classification."""
+import sys
+from pathlib import Path
+from typing import Tuple, Dict, List
+import time
+import tempfile
+import gradio as gr
+import numpy as np
+from PIL import Image
+# Add scripts directory to path
+project_root = Path(__file__).parent.parent
+sys.path.append(str(project_root / "scripts"))
+from predict import Food101Predictor
+from train import load_food101_splits
+class GradioFood101App:
+    """Gradio application for Food-101 classification."""
+    def __init__(self):
+        """Initialize the Gradio app with the ONNX predictor."""
+        self.predictor = None
+        self.load_model()
+    def load_model(self):
+        """Load the ONNX predictor."""
+        try:
+            # Paths
+            model_path = project_root / "models/efficientnet_b0_food101.onnx"
+            data_dir = project_root / "food-101/food-101"
+            # Load class names
+            _, _, _, idx_to_class = load_food101_splits(data_dir, val_split=0.1, seed=42)
+            class_names = [idx_to_class[i] for i in range(len(idx_to_class))]
+            # Initialize predictor
+            self.predictor = Food101Predictor(model_path, class_names)
+            print(f"[GRADIO] Model loaded successfully with {len(class_names)} classes")
+        except Exception as e:
+            print(f"[ERROR] Failed to load model: {e}")
+            raise
+    def predict_image(self, image: Image.Image, top_k: int = 5) -> Tuple[Dict, str]:
+        """
+        Predict food class for uploaded image.
+        Args:
+            image: PIL Image
+            top_k: Number of top predictions
+        Returns:
+            (confidences_dict, info_text)
+        """
+        if image is None:
+            return {}, "Please upload an image first!"
+        if self.predictor is None:
+            return {}, "Model not loaded. Please try again."
+        try:
+            # Save image temporarily
+            with tempfile.NamedTemporaryFile(delete=False, suffix='.jpg') as tmp_file:
+                image.save(tmp_file.name)
+                temp_path = Path(tmp_file.name)
+            # Run prediction
+            start_time = time.time()
+            predictions, probabilities, inference_time = self.predictor.predict(temp_path, top_k)
+            total_time = (time.time() - start_time) * 1000
+            # Clean up
+            temp_path.unlink(missing_ok=True)
+            # Format results for Gradio
+            confidences = {}
+            for pred, prob in zip(predictions, probabilities):
+                confidences[pred.replace('_', ' ').title()] = float(prob)
+            # Create info text
+            info_lines = [
+                f"🔍 **Prediction Results**",
+                f"⚡ **Inference Time**: {inference_time:.2f}ms",
+                f"🕒 **Total Time**: {total_time:.2f}ms",
+                f"🧠 **Model**: EfficientNet-B0 (ONNX)",
+                f"📊 **Top Prediction**: {predictions[0].replace('_', ' ').title()} ({probabilities[0]*100:.1f}%)"
+            ]
+            info_text = "\n".join(info_lines)
+            return confidences, info_text
+        except Exception as e:
+            temp_path.unlink(missing_ok=True)
+            return {}, f"❌ **Error**: {str(e)}"
+    def get_examples(self) -> List[List]:
+        """Get example images for the demo."""
+        examples_dir = project_root / "food-101/food-101/images"
+        examples = []
+        # Select a few example images from different classes
+        example_classes = ['pizza', 'hamburger', 'ice_cream']
+        for class_name in example_classes:
+            class_dir = examples_dir / class_name
+            if class_dir.exists():
+                # Get first image from class
+                images = list(class_dir.glob("*.jpg"))
+                if images:
+                    # Format: [image_path, top_k_value]
+                    examples.append([str(images[0]), 5])
+        # If no examples found, return empty list (Gradio will handle gracefully)
+        return examples if examples else []
+    def create_interface(self) -> gr.Interface:
+        """Create and return the Gradio interface."""
+        # Custom CSS for better styling
+        css = """
+        .main-header {
+            text-align: center;
+            background: linear-gradient(90deg, #ff6b6b, #4ecdc4);
+            -webkit-background-clip: text;
+            -webkit-text-fill-color: transparent;
+            font-size: 2.5em;
+            font-weight: bold;
+            margin-bottom: 20px;
+        }
+        .info-box {
+            background-color: #f0f8ff;
+            border-left: 5px solid #4ecdc4;
+            padding: 15px;
+            margin: 10px 0;
+            border-radius: 5px;
+        }
+        """
+        # Interface description
+        description = """
+        ## 🍕 Food-101 Image Classifier
+        Upload an image of food and get AI-powered predictions! This demo uses a fine-tuned **EfficientNet-B0** model
+        trained on the Food-101 dataset to classify 101 different types of food.
+        ### 🎯 **Model Performance**
+        - **Accuracy**: 84.49% on test set
+        - **Inference Speed**: ~7ms per image
+        - **Classes**: 101 different food types
+        ### 🚀 **How to use**
+        1. Upload an image or try one of our examples
+        2. Adjust the number of top predictions (1-10)
+        3. Click Submit to get predictions with confidence scores!
+        """
+        # Create the interface
+        interface = gr.Interface(
+            fn=self.predict_image,
+            inputs=[
+                gr.Image(
+                    type="pil",
+                    label="📸 Upload Food Image",
+                    height=300
+                ),
+                gr.Slider(
+                    minimum=1,
+                    maximum=10,
+                    value=5,
+                    step=1,
+                    label="🔢 Number of Predictions"
+                )
+            ],
+            outputs=[
+                gr.Label(
+                    label="🏆 Predictions & Confidence Scores",
+                    num_top_classes=10
+                ),
+                gr.Markdown(
+                    label="📊 Prediction Details"
+                )
+            ],
+            title="🍔 Food-101 AI Classifier",
+            description=description,
+            examples=self.get_examples(),
+            css=css,
+            theme=gr.themes.Soft(),
+            flagging_mode="never"
+        )
+        return interface
+def main():
+    """Main function to launch the Gradio app."""
+    try:
+        # Initialize the app
+        print("[GRADIO] Initializing Food-101 Classifier App...")
+        app = GradioFood101App()
+        # Create interface
+        print("[GRADIO] Creating Gradio interface...")
+        interface = app.create_interface()
+        # Launch the app
+        print("[GRADIO] Launching app...")
+        interface.launch(
+            share=False,  # Set to True to create public link
+            server_name="0.0.0.0",
+            server_port=7860,
+            show_error=True
+        )
+    except Exception as e:
+        print(f"[ERROR] Failed to launch Gradio app: {e}")
+        raise
+if __name__ == "__main__":
+    main()

models/efficientnet_b0_food101.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a19fedaec8bb25ca7af8d49a71aaf2d5f71588bd96d859c252fd1e4902345179
+size 16537732

requirements.txt ADDED Viewed

	@@ -0,0 +1,23 @@

+# Requirements for Hugging Face Spaces deployment
+# Core dependencies for Food-101 classifier
+# Deep Learning & Computer Vision
+torch>=2.0.0
+torchvision>=0.15.0
+onnxruntime>=1.20.0
+onnx>=1.15.0
+# Image Processing
+pillow>=9.0.0
+opencv-python>=4.5.0
+albumentations>=1.3.0
+# ML & Data Science
+numpy>=1.24.0
+scikit-learn>=1.3.0
+# Web Interface
+gradio>=4.0.0
+# Utilities
+pathlib

scripts/__pycache__/check_gpu.cpython-312.pyc ADDED Viewed

Binary file (1.37 kB). View file

scripts/__pycache__/evaluate.cpython-312.pyc ADDED Viewed

Binary file (9.39 kB). View file

scripts/__pycache__/gradcam.cpython-312.pyc ADDED Viewed

Binary file (9.18 kB). View file

scripts/__pycache__/predict.cpython-312.pyc ADDED Viewed

Binary file (9.5 kB). View file

scripts/__pycache__/train.cpython-312.pyc ADDED Viewed

Binary file (28.6 kB). View file

scripts/check_gpu.py ADDED Viewed

	@@ -0,0 +1,21 @@

+"""Utility script to report CUDA GPU availability for PyTorch."""
+from __future__ import annotations
+import torch
+def main() -> None:
+    has_cuda = torch.cuda.is_available()
+    print(f"torch.cuda.is_available(): {has_cuda}")
+    if has_cuda:
+        num_devices = torch.cuda.device_count()
+        print(f"Detected CUDA devices: {num_devices}")
+        for idx in range(num_devices):
+            name = torch.cuda.get_device_name(idx)
+            capability = torch.cuda.get_device_capability(idx)
+            print(f" - Device {idx}: {name} (compute capability {capability[0]}.{capability[1]})")
+    else:
+        print("No CUDA-capable GPU detected. Training will fall back to CPU.")
+if __name__ == "__main__":
+    main()

scripts/evaluate.py ADDED Viewed

	@@ -0,0 +1,136 @@

+"""Evaluation utilities for Food-101 classifiers (Phase 5)."""
+from __future__ import annotations
+import argparse
+import json
+from pathlib import Path
+from typing import Dict, List, Sequence
+import albumentations as A
+import numpy as np
+import torch
+from albumentations.pytorch import ToTensorV2
+from PIL import Image
+from sklearn.metrics import classification_report, confusion_matrix, top_k_accuracy_score
+from torch.utils.data import DataLoader, Dataset
+from train import BaselineCNN, Sample, build_model, load_food101_splits, set_seed
+class Food101EvalDataset(Dataset):
+    """Thin dataset wrapper that applies evaluation transforms."""
+    def __init__(self, samples: Sequence[Sample], transform: A.BasicTransform) -> None:
+        self.samples = list(samples)
+        self.transform = transform
+    def __len__(self) -> int:
+        return len(self.samples)
+    def __getitem__(self, index: int) -> Dict[str, torch.Tensor | int | str]:
+        sample = self.samples[index]
+        with sample.path.open("rb") as file:
+            array = np.array(Image.open(file).convert("RGB"))
+        tensor = self.transform(image=array)["image"]
+        return {"image": tensor, "label": sample.label, "path": str(sample.path)}
+def parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(description="Evaluate Food-101 checkpoints")
+    parser.add_argument("--data-dir", type=Path, default=Path("data/raw/food-101"), help="Dataset root")
+    parser.add_argument("--checkpoint", type=Path, required=True, help="Checkpoint file to evaluate")
+    parser.add_argument("--model", choices=["baseline", "resnet50", "efficientnet_b0"], required=True)
+    parser.add_argument("--split", choices=["val", "test"], default="test", help="Dataset split to use")
+    parser.add_argument("--batch-size", type=int, default=64)
+    parser.add_argument("--num-workers", type=int, default=4)
+    parser.add_argument("--image-size", type=int, default=224)
+    parser.add_argument("--seed", type=int, default=42)
+    parser.add_argument("--device", type=str, default="cuda" if torch.cuda.is_available() else "cpu")
+    parser.add_argument("--topk", nargs="*", type=int, default=[1, 5], help="Top-k accuracies to report")
+    parser.add_argument("--report-json", type=Path, default=None, help="Optional path to dump JSON metrics")
+    return parser.parse_args()
+def build_eval_transform(image_size: int) -> A.BasicTransform:
+    return A.Compose(
+        [
+            A.Resize(height=image_size + 32, width=image_size + 32),
+            A.CenterCrop(height=image_size, width=image_size),
+            A.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
+            ToTensorV2(),
+        ]
+    )
+def run_evaluation(args: argparse.Namespace) -> Dict[str, float]:
+    set_seed(args.seed)
+    device = torch.device(args.device)
+    data_dir = args.data_dir.expanduser().resolve()
+    train_samples, val_samples, test_samples, idx_to_class = load_food101_splits(data_dir, val_split=0.1, seed=args.seed)
+    class_names: List[str] = [idx_to_class[i] for i in range(len(idx_to_class))]
+    split_samples = val_samples if args.split == "val" else test_samples
+    transform = build_eval_transform(args.image_size)
+    dataset = Food101EvalDataset(split_samples, transform=transform)
+    dataloader = DataLoader(dataset, batch_size=args.batch_size, shuffle=False, num_workers=args.num_workers, pin_memory=torch.cuda.is_available())
+    model = build_model(args.model, num_classes=len(class_names), pretrained=False, freeze_backbone=False)
+    try:
+        state_dict = torch.load(args.checkpoint, map_location=device, weights_only=True)
+    except TypeError:
+        state_dict = torch.load(args.checkpoint, map_location=device)
+    model.load_state_dict(state_dict)
+    model.to(device)
+    model.eval()
+    all_probs: List[torch.Tensor] = []
+    all_labels: List[int] = []
+    with torch.no_grad():
+        for batch in dataloader:
+            inputs = batch["image"].to(device)
+            outputs = model(inputs)
+            probs = torch.softmax(outputs, dim=1)
+            all_probs.append(probs.cpu())
+            all_labels.extend(batch["label"].tolist())
+    probs_tensor = torch.cat(all_probs, dim=0)
+    preds = probs_tensor.argmax(dim=1).numpy()
+    labels_np = np.array(all_labels)
+    report = classification_report(labels_np, preds, target_names=class_names, output_dict=True, zero_division=0)
+    conf_mat = confusion_matrix(labels_np, preds)
+    metrics: Dict[str, float] = {
+        "accuracy": report["accuracy"],
+        "macro_precision": report["macro avg"]["precision"],
+        "macro_recall": report["macro avg"]["recall"],
+        "macro_f1": report["macro avg"]["f1-score"],
+        "weighted_f1": report["weighted avg"]["f1-score"],
+    }
+    for k in args.topk:
+        metrics[f"top{k}_accuracy"] = top_k_accuracy_score(labels_np, probs_tensor, k=k, labels=list(range(len(class_names))))
+    if args.report_json:
+        args.report_json.parent.mkdir(parents=True, exist_ok=True)
+        with args.report_json.open("w") as f:
+            json.dump({"metrics": metrics, "classification_report": report, "confusion_matrix": conf_mat.tolist()}, f, indent=2)
+    print("=== Metrics ===")
+    for key, value in metrics.items():
+        print(f"{key}: {value:.4f}")
+    print("=== Confusion Matrix (sample) ===")
+    print(conf_mat)
+    return metrics
+def main() -> None:
+    args = parse_args()
+    run_evaluation(args)
+if __name__ == "__main__":
+    main()

scripts/export_model.py ADDED Viewed

	@@ -0,0 +1,159 @@

+"""Export PyTorch models to ONNX format for optimized inference."""
+import argparse
+from pathlib import Path
+import torch
+import onnx
+import onnxruntime as ort
+import numpy as np
+from train import build_model, load_food101_splits, set_seed
+def parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(description="Export PyTorch model to ONNX")
+    parser.add_argument("--checkpoint", type=Path, required=True, help="Path to PyTorch checkpoint")
+    parser.add_argument("--model", choices=["baseline", "resnet50", "efficientnet_b0"], required=True)
+    parser.add_argument("--output", type=Path, required=True, help="Output ONNX file path")
+    parser.add_argument("--data-dir", type=Path, default=Path("food-101/food-101"), help="Dataset root")
+    parser.add_argument("--input-size", type=int, default=224, help="Input image size")
+    parser.add_argument("--batch-size", type=int, default=1, help="Batch size for export")
+    parser.add_argument("--device", type=str, default="cpu", help="Device for export")
+    parser.add_argument("--opset-version", type=int, default=11, help="ONNX opset version")
+    parser.add_argument("--seed", type=int, default=42)
+    return parser.parse_args()
+def export_to_onnx(
+    model: torch.nn.Module,
+    output_path: Path,
+    input_size: int,
+    batch_size: int,
+    device: torch.device,
+    opset_version: int = 11
+) -> None:
+    """Export PyTorch model to ONNX format."""
+    model.eval()
+    # Create dummy input tensor
+    dummy_input = torch.randn(batch_size, 3, input_size, input_size, device=device)
+    # Export to ONNX
+    output_path.parent.mkdir(parents=True, exist_ok=True)
+    torch.onnx.export(
+        model,
+        dummy_input,
+        str(output_path),
+        export_params=True,
+        opset_version=opset_version,
+        do_constant_folding=True,
+        input_names=['input'],
+        output_names=['output'],
+        dynamic_axes={
+            'input': {0: 'batch_size'},
+            'output': {0: 'batch_size'}
+        }
+    )
+    print(f"[SUCCESS] Model exported to {output_path}")
+def verify_onnx_model(onnx_path: Path, pytorch_model: torch.nn.Module, input_size: int, device: torch.device) -> None:
+    """Verify that ONNX model produces same outputs as PyTorch model."""
+    # Load ONNX model
+    onnx_model = onnx.load(str(onnx_path))
+    onnx.checker.check_model(onnx_model)
+    print("[SUCCESS] ONNX model is valid")
+    # Create ONNX Runtime session
+    ort_session = ort.InferenceSession(str(onnx_path))
+    # Create test input
+    test_input = torch.randn(1, 3, input_size, input_size, device=device)
+    # PyTorch inference
+    pytorch_model.eval()
+    with torch.no_grad():
+        pytorch_output = pytorch_model(test_input).cpu().numpy()
+    # ONNX inference
+    onnx_input = test_input.cpu().numpy()
+    onnx_output = ort_session.run(['output'], {'input': onnx_input})[0]
+    # Compare outputs
+    max_diff = np.max(np.abs(pytorch_output - onnx_output))
+    print(f"[SUCCESS] Max difference between PyTorch and ONNX outputs: {max_diff:.6f}")
+    if max_diff < 1e-5:
+        print("[SUCCESS] ONNX model verification successful!")
+    else:
+        print("[WARNING] Large difference detected - verify model compatibility")
+def get_model_info(model: torch.nn.Module, input_size: int) -> None:
+    """Print model information."""
+    # Count parameters
+    total_params = sum(p.numel() for p in model.parameters())
+    trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
+    print(f"Model Information:")
+    print(f"   Total parameters: {total_params:,}")
+    print(f"   Trainable parameters: {trainable_params:,}")
+    print(f"   Input size: {input_size}x{input_size}x3")
+def main() -> None:
+    args = parse_args()
+    set_seed(args.seed)
+    device = torch.device(args.device)
+    # Load dataset info to get number of classes
+    data_dir = args.data_dir.expanduser().resolve()
+    _, _, _, idx_to_class = load_food101_splits(data_dir, val_split=0.1, seed=args.seed)
+    num_classes = len(idx_to_class)
+    # Load model
+    print(f"[INFO] Loading {args.model} model...")
+    model = build_model(args.model, num_classes=num_classes, pretrained=False, freeze_backbone=False)
+    # Load checkpoint
+    try:
+        state_dict = torch.load(args.checkpoint, map_location=device, weights_only=True)
+    except TypeError:
+        state_dict = torch.load(args.checkpoint, map_location=device)
+    model.load_state_dict(state_dict)
+    model.to(device)
+    # Print model info
+    get_model_info(model, args.input_size)
+    # Export to ONNX
+    print(f"[INFO] Exporting to ONNX...")
+    export_to_onnx(
+        model=model,
+        output_path=args.output,
+        input_size=args.input_size,
+        batch_size=args.batch_size,
+        device=device,
+        opset_version=args.opset_version
+    )
+    # Verify ONNX model
+    print(f"[INFO] Verifying ONNX model...")
+    verify_onnx_model(args.output, model, args.input_size, device)
+    # Print file size
+    file_size_mb = args.output.stat().st_size / (1024 * 1024)
+    print(f"[INFO] ONNX file size: {file_size_mb:.2f} MB")
+    print("[SUCCESS] Export completed successfully!")
+if __name__ == "__main__":
+    main()

scripts/gradcam.py ADDED Viewed

	@@ -0,0 +1,143 @@

+"""Grad-CAM utilities for Food-101 models."""
+from __future__ import annotations
+import argparse
+from pathlib import Path
+from typing import Dict, List
+import albumentations as A
+import cv2
+import numpy as np
+import torch
+from albumentations.pytorch import ToTensorV2
+from PIL import Image
+from train import build_model, load_food101_splits, set_seed
+def parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(description="Generate Grad-CAM visualizations")
+    parser.add_argument("--data-dir", type=Path, default=Path("data/raw/food-101"))
+    parser.add_argument("--checkpoint", type=Path, required=True)
+    parser.add_argument("--model", choices=["baseline", "resnet50", "efficientnet_b0"], required=True)
+    parser.add_argument("--class-index", type=int, default=None, help="Target class index for Grad-CAM (defaults to prediction)")
+    parser.add_argument("--image", type=Path, required=True, help="Path to input image file")
+    parser.add_argument("--output", type=Path, required=True, help="Output heatmap path")
+    parser.add_argument("--image-size", type=int, default=224)
+    parser.add_argument("--device", type=str, default="cuda" if torch.cuda.is_available() else "cpu")
+    parser.add_argument("--seed", type=int, default=42)
+    parser.add_argument("--alpha", type=float, default=0.4, help="Blending factor for heatmap overlay")
+    return parser.parse_args()
+def build_preprocess(image_size: int) -> A.BasicTransform:
+    return A.Compose(
+        [
+            A.Resize(height=image_size, width=image_size),
+            A.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
+            ToTensorV2(),
+        ]
+    )
+def get_last_conv_layer(model: torch.nn.Module) -> torch.nn.Module:
+    # EfficientNet B0
+    if hasattr(model, "features") and hasattr(model.features, "_modules"):
+        return model.features[-1][-1]  # Last block, last layer
+    # ResNet50
+    if hasattr(model, "layer4"):
+        return model.layer4[-1].conv3  # type: ignore[attr-defined]
+    # Baseline CNN
+    if hasattr(model, "features"):
+        for module in reversed(model.features):
+            if isinstance(module, torch.nn.Conv2d):
+                return module
+    # Generic fallback
+    if hasattr(model, "classifier") and isinstance(model.classifier, torch.nn.Sequential):
+        for module in reversed(model.classifier):
+            if isinstance(module, torch.nn.Conv2d):
+                return module
+    raise RuntimeError("Could not automatically determine last convolutional layer")
+def generate_gradcam(
+    model: torch.nn.Module,
+    image_tensor: torch.Tensor,
+    target_class: int | None,
+) -> np.ndarray:
+    gradients: List[torch.Tensor] = []
+    activations: List[torch.Tensor] = []
+    def backward_hook(module: torch.nn.Module, grad_input: tuple[torch.Tensor, ...], grad_output: tuple[torch.Tensor, ...]) -> None:
+        gradients.append(grad_output[0])
+    def forward_hook(module: torch.nn.Module, args: tuple[torch.Tensor, ...], output: torch.Tensor) -> None:
+        activations.append(output)
+    target_layer = get_last_conv_layer(model)
+    handle_fwd = target_layer.register_forward_hook(forward_hook)
+    handle_bwd = target_layer.register_full_backward_hook(backward_hook)
+    try:
+        output = model(image_tensor)
+        if target_class is None:
+            target_class = int(output.argmax(dim=1).item())
+        loss = output[:, target_class].sum()
+        model.zero_grad()
+        loss.backward()
+        grads = gradients[0]
+        acts = activations[0]
+        weights = grads.mean(dim=(2, 3), keepdim=True)
+        cam = torch.relu((weights * acts).sum(dim=1, keepdim=True))
+        cam = torch.nn.functional.interpolate(cam, size=image_tensor.shape[2:], mode="bilinear", align_corners=False)
+        cam = cam.squeeze().detach().cpu().numpy()
+        cam = (cam - cam.min()) / (cam.max() - cam.min() + 1e-8)
+        return cam
+    finally:
+        handle_fwd.remove()
+        handle_bwd.remove()
+def overlay_heatmap(original: np.ndarray, heatmap: np.ndarray, alpha: float) -> np.ndarray:
+    # Resize heatmap to match original image size
+    heatmap_resized = cv2.resize(heatmap, (original.shape[1], original.shape[0]))
+    heatmap_color = cv2.applyColorMap((heatmap_resized * 255).astype(np.uint8), cv2.COLORMAP_JET)
+    heatmap_color = cv2.cvtColor(heatmap_color, cv2.COLOR_BGR2RGB)
+    overlay = cv2.addWeighted(heatmap_color, alpha, original, 1 - alpha, 0)
+    return overlay
+def main() -> None:
+    args = parse_args()
+    set_seed(args.seed)
+    device = torch.device(args.device)
+    model = build_model(args.model, num_classes=101, pretrained=False, freeze_backbone=False)
+    try:
+        state_dict = torch.load(args.checkpoint, map_location=device, weights_only=True)
+    except TypeError:
+        state_dict = torch.load(args.checkpoint, map_location=device)
+    model.load_state_dict(state_dict)
+    model.to(device)
+    model.eval()
+    preprocess = build_preprocess(args.image_size)
+    with args.image.open("rb") as f:
+        original = np.array(Image.open(f).convert("RGB"))
+    tensor = preprocess(image=original)["image"].unsqueeze(0).to(device)
+    heatmap = generate_gradcam(model, tensor, args.class_index)
+    overlay = overlay_heatmap(original, heatmap, alpha=args.alpha)
+    args.output.parent.mkdir(parents=True, exist_ok=True)
+    Image.fromarray(overlay).save(args.output)
+    print(f"Grad-CAM saved to {args.output}")
+if __name__ == "__main__":
+    main()

scripts/predict.py ADDED Viewed

	@@ -0,0 +1,175 @@

+"""Fast inference script using ONNX model for Food-101 classification."""
+import argparse
+import time
+from pathlib import Path
+from typing import Dict, List, Tuple
+import albumentations as A
+import numpy as np
+import onnxruntime as ort
+from albumentations.pytorch import ToTensorV2
+from PIL import Image
+from train import load_food101_splits
+class Food101Predictor:
+    """Fast ONNX-based predictor for Food-101 classification."""
+    def __init__(self, onnx_path: Path, class_names: List[str], providers: List[str] = None):
+        """Initialize predictor with ONNX model."""
+        self.class_names = class_names
+        self.num_classes = len(class_names)
+        # Initialize ONNX Runtime session with optimal providers
+        if providers is None:
+            providers = ['CPUExecutionProvider']
+        self.session = ort.InferenceSession(str(onnx_path), providers=providers)
+        # Get input/output info
+        self.input_name = self.session.get_inputs()[0].name
+        self.output_name = self.session.get_outputs()[0].name
+        # Create preprocessing transform
+        self.transform = A.Compose([
+            A.Resize(height=224, width=224),
+            A.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
+            ToTensorV2(),
+        ])
+        print(f"[INFO] Predictor initialized with {len(class_names)} classes")
+        print(f"[INFO] ONNX Runtime providers: {self.session.get_providers()}")
+    def preprocess_image(self, image_path: Path) -> np.ndarray:
+        """Preprocess image for inference."""
+        with image_path.open("rb") as f:
+            image = Image.open(f).convert("RGB")
+        # Convert to numpy array
+        image_array = np.array(image)
+        # Apply transforms
+        transformed = self.transform(image=image_array)
+        tensor = transformed["image"]
+        # Add batch dimension and convert to numpy
+        batch = tensor.unsqueeze(0).numpy()
+        return batch
+    def predict(self, image_path: Path, top_k: int = 5) -> Tuple[List[str], List[float], float]:
+        """
+        Predict food class for given image.
+        Returns:
+            predictions: List of top-k class names
+            probabilities: List of top-k probabilities
+            inference_time: Time in milliseconds
+        """
+        # Preprocess
+        start_time = time.time()
+        input_batch = self.preprocess_image(image_path)
+        # Run inference
+        outputs = self.session.run([self.output_name], {self.input_name: input_batch})[0]
+        # Apply softmax to get probabilities
+        exp_outputs = np.exp(outputs - np.max(outputs, axis=1, keepdims=True))
+        probabilities = exp_outputs / np.sum(exp_outputs, axis=1, keepdims=True)
+        # Get top-k predictions
+        top_indices = np.argsort(probabilities[0])[::-1][:top_k]
+        top_probs = probabilities[0][top_indices].tolist()
+        top_classes = [self.class_names[i] for i in top_indices]
+        inference_time = (time.time() - start_time) * 1000  # Convert to ms
+        return top_classes, top_probs, inference_time
+    def predict_batch(self, image_paths: List[Path], top_k: int = 5) -> List[Dict]:
+        """Predict multiple images at once."""
+        results = []
+        for image_path in image_paths:
+            classes, probs, time_ms = self.predict(image_path, top_k)
+            results.append({
+                'image_path': str(image_path),
+                'predictions': classes,
+                'probabilities': probs,
+                'inference_time_ms': time_ms
+            })
+        return results
+def parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(description="Fast inference with ONNX model")
+    parser.add_argument("--model", type=Path, required=True, help="Path to ONNX model file")
+    parser.add_argument("--image", type=Path, required=True, help="Path to input image")
+    parser.add_argument("--data-dir", type=Path, default=Path("food-101/food-101"), help="Dataset root for class names")
+    parser.add_argument("--top-k", type=int, default=5, help="Number of top predictions to show")
+    parser.add_argument("--providers", nargs="*", default=None,
+                       help="ONNX Runtime providers (e.g., CPUExecutionProvider)")
+    parser.add_argument("--seed", type=int, default=42)
+    return parser.parse_args()
+def benchmark_inference(predictor: Food101Predictor, image_path: Path, num_runs: int = 10) -> None:
+    """Benchmark inference speed."""
+    print(f"[INFO] Benchmarking inference with {num_runs} runs...")
+    times = []
+    for i in range(num_runs):
+        _, _, inference_time = predictor.predict(image_path, top_k=1)
+        times.append(inference_time)
+        if i == 0:
+            print(f"[INFO] First run (cold start): {inference_time:.2f}ms")
+    # Statistics
+    mean_time = np.mean(times[1:])  # Exclude cold start
+    std_time = np.std(times[1:])
+    min_time = np.min(times[1:])
+    max_time = np.max(times[1:])
+    print(f"[BENCHMARK] Average inference time: {mean_time:.2f} ± {std_time:.2f}ms")
+    print(f"[BENCHMARK] Min: {min_time:.2f}ms, Max: {max_time:.2f}ms")
+    if mean_time < 100:
+        print(f"[SUCCESS] Target latency achieved! ({mean_time:.2f}ms < 100ms)")
+    else:
+        print(f"[WARNING] Target latency not met ({mean_time:.2f}ms >= 100ms)")
+def main() -> None:
+    args = parse_args()
+    # Load class names from dataset
+    data_dir = args.data_dir.expanduser().resolve()
+    _, _, _, idx_to_class = load_food101_splits(data_dir, val_split=0.1, seed=args.seed)
+    class_names = [idx_to_class[i] for i in range(len(idx_to_class))]
+    # Initialize predictor
+    predictor = Food101Predictor(args.model, class_names, providers=args.providers)
+    # Run prediction
+    print(f"[INFO] Predicting image: {args.image}")
+    predictions, probabilities, inference_time = predictor.predict(args.image, args.top_k)
+    # Display results
+    print(f"\n[RESULTS] Inference time: {inference_time:.2f}ms")
+    print("Top predictions:")
+    for i, (class_name, prob) in enumerate(zip(predictions, probabilities), 1):
+        print(f"  {i}. {class_name}: {prob:.4f} ({prob*100:.2f}%)")
+    # Run benchmark
+    print()
+    benchmark_inference(predictor, args.image)
+if __name__ == "__main__":
+    main()

scripts/train.py ADDED Viewed

	@@ -0,0 +1,590 @@

+"""Training script for Food-101 supporting baseline, transfer, and advanced training utilities."""
+from __future__ import annotations
+import argparse
+import csv
+import json
+import random
+from dataclasses import dataclass
+from pathlib import Path
+from typing import Dict, Iterable, List, Optional, Sequence, Tuple
+import albumentations as A
+import numpy as np
+import torch
+from albumentations.pytorch import ToTensorV2
+from PIL import Image
+from torch import nn
+from torch.amp import GradScaler, autocast
+from torch.optim import Adam
+from torch.utils.data import DataLoader, Dataset
+from torchvision import models
+from torchvision.models import EfficientNet_B0_Weights, ResNet50_Weights
+try:  # Optional dependency used for experiment tracking.
+    import wandb  # type: ignore
+except ImportError:  # pragma: no cover - handled at runtime when library missing.
+    wandb = None
+@dataclass(frozen=True)
+class Sample:
+    """Minimal container storing an image path and class index."""
+    path: Path
+    label: int
+class Food101Dataset(Dataset):
+    """Custom Dataset that loads Food-101 images and applies augmentations."""
+    def __init__(self, samples: Sequence[Sample], transform: A.BasicTransform | None = None) -> None:
+        self.samples = list(samples)
+        self.transform = transform
+    def __len__(self) -> int:
+        return len(self.samples)
+    def __getitem__(self, index: int) -> Tuple[torch.Tensor, int]:
+        sample = self.samples[index]
+        with sample.path.open("rb") as file:
+            # Convert PIL image to NumPy array so Albumentations can process it.
+            array = np.array(Image.open(file).convert("RGB"))
+        if self.transform is not None:
+            array = self.transform(image=array)["image"]
+        return array, sample.label
+class BaselineCNN(nn.Module):
+    """Lightweight CNN baseline with three feature stages and global pooling."""
+    def __init__(self, num_classes: int) -> None:
+        super().__init__()
+        self.features = nn.Sequential(
+            _conv_block(3, 32),
+            _conv_block(32, 64),
+            nn.MaxPool2d(2),
+            _conv_block(64, 128),
+            nn.MaxPool2d(2),
+            _conv_block(128, 256),
+            nn.MaxPool2d(2),
+        )
+        self.classifier = nn.Sequential(
+            nn.AdaptiveAvgPool2d(1),
+            nn.Flatten(),
+            nn.Dropout(0.3),
+            nn.Linear(256, num_classes),
+        )
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        x = self.features(x)
+        return self.classifier(x)
+def _conv_block(in_channels: int, out_channels: int) -> nn.Sequential:
+    """Creates a Conv-BN-ReLU block reused across the network."""
+    return nn.Sequential(
+        nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1, bias=False),
+        nn.BatchNorm2d(out_channels),
+        nn.ReLU(inplace=True),
+    )
+def parse_args() -> argparse.Namespace:
+    """Parses command line arguments controlling training behavior."""
+    parser = argparse.ArgumentParser(description="Train Food-101 image classifiers")
+    parser.add_argument("--data-dir", type=Path, default=Path("data/raw/food-101"), help="Root directory of Food-101 dataset")
+    parser.add_argument("--model", type=str, choices=["baseline", "resnet50", "efficientnet_b0"], default="baseline", help="Model architecture to train")
+    parser.add_argument("--epochs", type=int, default=10, help="Number of training epochs")
+    parser.add_argument("--batch-size", type=int, default=64, help="Mini-batch size")
+    parser.add_argument("--learning-rate", type=float, default=1e-3, help="Optimizer learning rate")
+    parser.add_argument("--val-split", type=float, default=0.1, help="Fraction of train set used for validation")
+    parser.add_argument("--num-workers", type=int, default=4, help="DataLoader worker processes")
+    parser.add_argument("--image-size", type=int, default=224, help="Square image size fed to the network")
+    parser.add_argument("--seed", type=int, default=42, help="Random seed for reproducibility")
+    parser.add_argument("--device", type=str, default="cuda" if torch.cuda.is_available() else "cpu", help="Computation device")
+    parser.add_argument("--pretrained", action="store_true", default=None, help="Use pretrained weights when available (transfer models)")
+    parser.add_argument("--no-pretrained", action="store_false", dest="pretrained", help="Disable pretrained weights for transfer models")
+    parser.add_argument("--freeze-backbone", action="store_true", help="Freeze feature extractor when using transfer models")
+    parser.add_argument("--experiment-name", type=str, default=None, help="Optional experiment name for checkpoints/logs")
+    parser.add_argument("--checkpoint-dir", type=Path, default=Path("checkpoints"), help="Where to store model checkpoints")
+    parser.add_argument("--log-dir", type=Path, default=Path("logs"), help="Where to store training logs")
+    parser.add_argument("--early-stop-patience", type=int, default=5, help="Epochs to wait before stopping after no validation improvement")
+    parser.add_argument("--early-stop-min-delta", type=float, default=0.0, help="Minimum change to qualify as an improvement")
+    parser.add_argument("--early-stop-metric", choices=["accuracy", "loss"], default="accuracy", help="Validation metric used for early stopping")
+    parser.add_argument("--grad-clip-norm", type=float, default=None, help="Gradient clipping norm (L2). Disabled if not set")
+    parser.add_argument("--use-amp", action="store_true", help="Enable mixed precision training (requires CUDA)")
+    parser.add_argument("--wandb", dest="use_wandb", action="store_true", help="Log metrics to Weights & Biases")
+    parser.add_argument("--no-wandb", dest="use_wandb", action="store_false", help="Disable Weights & Biases logging")
+    parser.add_argument("--wandb-project", type=str, default=None, help="Weights & Biases project name")
+    parser.add_argument("--wandb-entity", type=str, default=None, help="Weights & Biases entity (team/user)")
+    parser.add_argument("--wandb-run-name", type=str, default=None, help="Weights & Biases run name override")
+    args = parser.parse_args()
+    # Default to pretrained weights for transfer models unless user explicitly disables them.
+    if args.pretrained is None:
+        args.pretrained = args.model != "baseline"
+    if not hasattr(args, "use_wandb"):
+        args.use_wandb = False
+    return args
+def set_seed(seed: int) -> None:
+    """Fixes random seeds for reproducibility across libraries."""
+    random.seed(seed)
+    np.random.seed(seed)
+    torch.manual_seed(seed)
+    if torch.cuda.is_available():
+        torch.cuda.manual_seed_all(seed)
+    torch.backends.cudnn.deterministic = True
+    torch.backends.cudnn.benchmark = False
+def build_transforms(image_size: int) -> Tuple[A.BasicTransform, A.BasicTransform]:
+    """Constructs augmentation pipelines for train and evaluation splits."""
+    normalize = A.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))
+    train_transform = A.Compose(
+        [
+            A.RandomResizedCrop(
+                size=(image_size, image_size),
+                scale=(0.8, 1.0),
+                ratio=(0.75, 1.33),
+            ),
+            A.HorizontalFlip(p=0.5),
+            A.Affine(
+                scale=(0.9, 1.1),
+                translate_percent=(-0.05, 0.05),
+                rotate=(-15, 15),
+                shear=(0.0, 0.0),
+                p=0.3,
+            ),
+            A.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1, p=0.3),
+            normalize,
+            ToTensorV2(),
+        ]
+    )
+    eval_transform = A.Compose(
+        [
+            A.Resize(height=image_size + 32, width=image_size + 32),
+            A.CenterCrop(height=image_size, width=image_size),
+            normalize,
+            ToTensorV2(),
+        ]
+    )
+    return train_transform, eval_transform
+def build_model(
+    model_name: str,
+    num_classes: int,
+    pretrained: bool,
+    freeze_backbone: bool,
+) -> nn.Module:
+    """Factory that returns the requested architecture configured for Food-101."""
+    if model_name == "baseline":
+        model = BaselineCNN(num_classes=num_classes)
+    elif model_name == "resnet50":
+        weights = ResNet50_Weights.DEFAULT if pretrained else None
+        model = models.resnet50(weights=weights)
+        if freeze_backbone:
+            for param in model.parameters():
+                param.requires_grad = False
+        in_features = model.fc.in_features
+        model.fc = nn.Linear(in_features, num_classes)
+    elif model_name == "efficientnet_b0":
+        weights = EfficientNet_B0_Weights.DEFAULT if pretrained else None
+        model = models.efficientnet_b0(weights=weights)
+        if freeze_backbone:
+            for param in model.parameters():
+                param.requires_grad = False
+        in_features = model.classifier[-1].in_features
+        model.classifier[-1] = nn.Linear(in_features, num_classes)
+    else:
+        raise ValueError(f"Unsupported model: {model_name}")
+    return model
+def load_food101_splits(data_dir: Path, val_split: float, seed: int) -> Tuple[List[Sample], List[Sample], List[Sample], Dict[int, str]]:
+    """Loads Food-101 metadata and returns train/val/test splits."""
+    images_dir = data_dir / "images"
+    meta_dir = data_dir / "meta"
+    classes = _read_classes(meta_dir / "classes.txt")
+    class_to_idx = {name: idx for idx, name in enumerate(classes)}
+    with (meta_dir / "train.json").open() as f:
+        train_meta = json.load(f)
+    with (meta_dir / "test.json").open() as f:
+        test_meta = json.load(f)
+    rng = random.Random(seed)
+    train_samples: List[Sample] = []
+    val_samples: List[Sample] = []
+    for cls_name, items in train_meta.items():
+        paths = list(items)
+        rng.shuffle(paths)
+        val_count = max(1, int(len(paths) * val_split))
+        val_subset = paths[:val_count]
+        train_subset = paths[val_count:]
+        train_samples.extend(_build_samples(train_subset, images_dir, class_to_idx[cls_name]))
+        val_samples.extend(_build_samples(val_subset, images_dir, class_to_idx[cls_name]))
+    test_samples = []
+    for cls_name, items in test_meta.items():
+        test_samples.extend(_build_samples(items, images_dir, class_to_idx[cls_name]))
+    idx_to_class = {idx: name for name, idx in class_to_idx.items()}
+    return train_samples, val_samples, test_samples, idx_to_class
+def _build_samples(items: Iterable[str], images_dir: Path, label: int) -> List[Sample]:
+    """Creates Sample objects from relative paths and a target label."""
+    samples = []
+    for item in items:
+        path = images_dir / f"{item}.jpg"
+        samples.append(Sample(path=path, label=label))
+    return samples
+def _read_classes(path: Path) -> List[str]:
+    """Reads the list of Food-101 class names from disk."""
+    with path.open() as handle:
+        return [line.strip() for line in handle if line.strip()]
+def create_dataloaders(
+    train_samples: Sequence[Sample],
+    val_samples: Sequence[Sample],
+    test_samples: Sequence[Sample],
+    train_transform: A.BasicTransform,
+    eval_transform: A.BasicTransform,
+    batch_size: int,
+    num_workers: int,
+) -> Tuple[DataLoader, DataLoader, DataLoader]:
+    """Wraps datasets with PyTorch DataLoaders."""
+    pin_memory = torch.cuda.is_available()
+    train_dataset = Food101Dataset(train_samples, transform=train_transform)
+    val_dataset = Food101Dataset(val_samples, transform=eval_transform)
+    test_dataset = Food101Dataset(test_samples, transform=eval_transform)
+    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=num_workers, pin_memory=pin_memory)
+    val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False, num_workers=num_workers, pin_memory=pin_memory)
+    test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False, num_workers=num_workers, pin_memory=pin_memory)
+    return train_loader, val_loader, test_loader
+def write_metrics(log_path: Path, rows: Sequence[Dict[str, object]]) -> None:
+    """Appends metric rows to the CSV log."""
+    log_path.parent.mkdir(parents=True, exist_ok=True)
+    file_exists = log_path.exists()
+    fieldnames = ["model", "experiment", "epoch", "split", "loss", "accuracy"]
+    with log_path.open("a", newline="") as csvfile:
+        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
+        if not file_exists:
+            writer.writeheader()
+        writer.writerows(rows)
+def maybe_init_wandb(args: argparse.Namespace, config_extra: Dict[str, object]) -> Optional[object]:
+    """Initializes Weights & Biases run if requested and available."""
+    if not args.use_wandb:
+        return None
+    if args.wandb_project is None:
+        print("[wandb] Project not specified; skipping W&B logging.")
+        return None
+    if wandb is None:
+        print("[wandb] Package not installed; skipping W&B logging.")
+        return None
+    run_name = args.wandb_run_name or args.experiment_name or args.model
+    config = {k: (str(v) if isinstance(v, Path) else v) for k, v in vars(args).items()}
+    config.update(config_extra)
+    run = wandb.init(project=args.wandb_project, entity=args.wandb_entity, name=run_name, config=config)
+    return run
+def train_one_epoch(
+    model: nn.Module,
+    dataloader: DataLoader,
+    criterion: nn.Module,
+    optimizer: torch.optim.Optimizer,
+    device: torch.device,
+    scaler: GradScaler,
+    grad_clip_norm: Optional[float],
+    amp_enabled: bool,
+    clip_params: Sequence[torch.nn.Parameter],
+    amp_device_type: str,
+) -> Tuple[float, float]:
+    """Runs one training epoch and returns loss and accuracy."""
+    model.train()
+    running_loss = 0.0
+    correct = 0
+    total = 0
+    for inputs, targets in dataloader:
+        inputs = inputs.to(device)
+        targets = targets.to(device)
+        # Forward + backward pass and optimizer update.
+        optimizer.zero_grad()
+        with autocast(amp_device_type, enabled=amp_enabled):
+            outputs = model(inputs)
+            loss = criterion(outputs, targets)
+        if scaler.is_enabled():
+            scaler.scale(loss).backward()
+            if grad_clip_norm is not None:
+                scaler.unscale_(optimizer)
+                torch.nn.utils.clip_grad_norm_(clip_params, grad_clip_norm)
+            scaler.step(optimizer)
+            scaler.update()
+        else:
+            loss.backward()
+            if grad_clip_norm is not None:
+                torch.nn.utils.clip_grad_norm_(clip_params, grad_clip_norm)
+            optimizer.step()
+        running_loss += loss.item() * inputs.size(0)
+        preds = outputs.argmax(dim=1)
+        correct += (preds == targets).sum().item()
+        total += targets.size(0)
+    epoch_loss = running_loss / max(total, 1)
+    epoch_acc = correct / max(total, 1)
+    return epoch_loss, epoch_acc
+def evaluate(
+    model: nn.Module,
+    dataloader: DataLoader,
+    criterion: nn.Module,
+    device: torch.device,
+    amp_enabled: bool,
+    amp_device_type: str,
+) -> Tuple[float, float]:
+    """Evaluates the model and returns loss and accuracy."""
+    model.eval()
+    running_loss = 0.0
+    correct = 0
+    total = 0
+    with torch.no_grad():
+        for inputs, targets in dataloader:
+            inputs = inputs.to(device)
+            targets = targets.to(device)
+            with autocast(amp_device_type, enabled=amp_enabled):
+                outputs = model(inputs)
+                loss = criterion(outputs, targets)
+            running_loss += loss.item() * inputs.size(0)
+            preds = outputs.argmax(dim=1)
+            correct += (preds == targets).sum().item()
+            total += targets.size(0)
+    epoch_loss = running_loss / max(total, 1)
+    epoch_acc = correct / max(total, 1)
+    return epoch_loss, epoch_acc
+def main() -> None:
+    args = parse_args()
+    set_seed(args.seed)
+    data_dir = args.data_dir.expanduser().resolve()
+    if not data_dir.exists():
+        raise FileNotFoundError(f"Dataset directory not found: {data_dir}")
+    # Build samples for each split based on Food-101 metadata.
+    train_samples, val_samples, test_samples, idx_to_class = load_food101_splits(data_dir, args.val_split, args.seed)
+    num_classes = len(idx_to_class)
+    # Prepare augmentations and dataloaders.
+    train_transform, eval_transform = build_transforms(args.image_size)
+    train_loader, val_loader, test_loader = create_dataloaders(
+        train_samples,
+        val_samples,
+        test_samples,
+        train_transform,
+        eval_transform,
+        args.batch_size,
+        args.num_workers,
+    )
+    # Initialize model, loss, and optimizer on the desired device.
+    device = torch.device(args.device)
+    model = build_model(
+        model_name=args.model,
+        num_classes=num_classes,
+        pretrained=args.pretrained,
+        freeze_backbone=args.freeze_backbone,
+    ).to(device)
+    criterion = nn.CrossEntropyLoss()
+    trainable_params = [p for p in model.parameters() if p.requires_grad]
+    if not trainable_params:
+        trainable_params = list(model.parameters())
+    optimizer = Adam(trainable_params, lr=args.learning_rate)
+    amp_enabled = args.use_amp and device.type == "cuda"
+    amp_device_type = "cuda" if device.type == "cuda" else "cpu"
+    scaler = GradScaler(amp_device_type, enabled=amp_enabled)
+    clip_params: Sequence[torch.nn.Parameter] = trainable_params
+    checkpoint_dir = args.checkpoint_dir
+    checkpoint_dir.mkdir(parents=True, exist_ok=True)
+    experiment_name = args.experiment_name or args.model
+    checkpoint_path = checkpoint_dir / f"{experiment_name}.pth"
+    log_path = args.log_dir / f"{experiment_name}.csv"
+    monitor_max = args.early_stop_metric == "accuracy"
+    best_metric = -float("inf") if monitor_max else float("inf")
+    patience_counter = 0
+    best_val_acc = 0.0
+    epochs_completed = 0
+    wandb_run = maybe_init_wandb(
+        args,
+        {
+            "num_classes": num_classes,
+            "train_samples": len(train_samples),
+            "val_samples": len(val_samples),
+            "test_samples": len(test_samples),
+            "amp_enabled": amp_enabled,
+        },
+    )
+    # Standard training loop with validation monitoring.
+    for epoch in range(1, args.epochs + 1):
+        train_loss, train_acc = train_one_epoch(
+            model,
+            train_loader,
+            criterion,
+            optimizer,
+            device,
+            scaler,
+            args.grad_clip_norm,
+            amp_enabled,
+            clip_params,
+            amp_device_type,
+        )
+        val_loss, val_acc = evaluate(model, val_loader, criterion, device, amp_enabled, amp_device_type)
+        print(
+            f"Epoch {epoch:02d}: train_loss={train_loss:.4f} train_acc={train_acc:.4f} "
+            f"val_loss={val_loss:.4f} val_acc={val_acc:.4f}"
+        )
+        write_metrics(
+            log_path,
+            [
+                {
+                    "model": args.model,
+                    "experiment": experiment_name,
+                    "epoch": epoch,
+                    "split": "train",
+                    "loss": train_loss,
+                    "accuracy": train_acc,
+                },
+                {
+                    "model": args.model,
+                    "experiment": experiment_name,
+                    "epoch": epoch,
+                    "split": "val",
+                    "loss": val_loss,
+                    "accuracy": val_acc,
+                },
+            ],
+        )
+        if wandb_run is not None:
+            wandb_run.log(
+                {
+                    "epoch": epoch,
+                    "train/loss": train_loss,
+                    "train/accuracy": train_acc,
+                    "val/loss": val_loss,
+                    "val/accuracy": val_acc,
+                },
+                step=epoch,
+            )
+        if val_acc > best_val_acc:
+            best_val_acc = val_acc
+        monitor_value = val_acc if monitor_max else val_loss
+        improved = (
+            monitor_value > best_metric + args.early_stop_min_delta
+            if monitor_max
+            else monitor_value < best_metric - args.early_stop_min_delta
+        )
+        if improved:
+            best_metric = monitor_value
+            patience_counter = 0
+            torch.save(model.state_dict(), checkpoint_path)
+        else:
+            patience_counter += 1
+            if patience_counter >= args.early_stop_patience:
+                print(
+                    f"Early stopping triggered at epoch {epoch}. "
+                    f"Best {args.early_stop_metric}: {best_metric:.4f}"
+                )
+                epochs_completed = epoch
+                break
+        epochs_completed = epoch
+    else:
+        epochs_completed = args.epochs
+    # Ensure we persist weights even if validation never improved.
+    if not checkpoint_path.exists():
+        torch.save(model.state_dict(), checkpoint_path)
+    # Load best checkpoint before final evaluation.
+    try:
+        state_dict = torch.load(checkpoint_path, map_location=device, weights_only=True)
+    except TypeError:
+        state_dict = torch.load(checkpoint_path, map_location=device)
+    model.load_state_dict(state_dict)
+    # Final evaluation on the held-out test set.
+    test_loss, test_acc = evaluate(model, test_loader, criterion, device, amp_enabled, amp_device_type)
+    print(f"Test metrics - loss: {test_loss:.4f} accuracy: {test_acc:.4f}")
+    write_metrics(
+        log_path,
+        [
+            {
+                "model": args.model,
+                "experiment": experiment_name,
+                "epoch": epochs_completed,
+                "split": "test",
+                "loss": test_loss,
+                "accuracy": test_acc,
+            }
+        ],
+    )
+    if wandb_run is not None:
+        wandb_run.log(
+            {
+                "test/loss": test_loss,
+                "test/accuracy": test_acc,
+                "best/val_accuracy": best_val_acc,
+                f"best/val_{args.early_stop_metric}": best_metric,
+            },
+            step=epochs_completed,
+        )
+        wandb_run.finish()
+if __name__ == "__main__":
+    main()