--- title: Custom YOLOv3 on Pascal VOC using pytorch Lightening emoji: 🦀 colorFrom: blue colorTo: red sdk: gradio sdk_version: 3.40.1 app_file: app.py pinned: false license: mit --- # Custom YOLOv3 on Pascal VOC using pytorch Lightening ## Introduction This repository contains an application for PASCAL Visual Object Classes Detection using PyTorch Lightning. Object Detection is implemented using custom Yolo3. The Application includes functionalities for GradCam ## Installation To install this aplication or to run it locally on Colab. The same folder structure needs to be followed ~~~ CIFAR10 Image Classification |── requirements.txt |── yolov3.py |── app.py |── yolov3_model_without_75_mosaic.pth |── callbacks.py |── config.py |── dataset.py |── loss.py |── main_yolov3_lightening.py |── utils.py |── utils_for_app.py |── README.md ~~~ 1. Clone this repository: ~~~ git lfs install git clone https://huggingface.co/spaces/PrarthanaTS/YoloV3_PASCAL ~~~ 2. Run the app.py script: The app.py python file includes a 'demo.launch()' command that will launch a web-based interface using Gradio. You can access the interface by opening the provided URL in your web browser. ## Usage The app has a single tab: ### YoloV3 on Pascal VOC Dataset with GradCAM In this tab, we can upload our own image 1. Input Image: Upload our own image or select one of the example images from the given images After providing the settings values, click on the "Submit" button to see the results. #### In the Output 1. Model prediction for object detection 2. Model Prediction for the image in GradCam Visulaization ![image](https://github.com/prarthanats/ERA/assets/32382676/27730ff4-7dc0-467d-affa-c548a55f8887) ## Code Explanation The notebook for this assignment can be accessed here: [Assignment 13](https://github.com/prarthanats/ERA/blob/main/S13_Assignment/Final_Code_without_mosaic.ipynb) The YoloV3 Module includes all the classes including: ~~~ CNNBlock: A building block comprising a convolutional layer, batch normalization, and LeakyReLU activation, used to process image features in convolutional neural networks. ResidualBlock: A module containing multiple repetitions of two stacked CNNBlocks, capable of performing residual connections to help in feature extraction and information flow. ScalePrediction: Generates scale-specific predictions by employing convolutional layers with varying kernel sizes, aiding in object detection tasks, particularly for the YOLO architecture. YOLOv3: A YOLO variant for object detection, integrating various CNN layers, ResidualBlocks, and ScalePredictions to provide multi-scale predictions of object classes and bounding boxes in an image. ~~~ The Lightening Module: ~~~ 1. __init__(self, lr_value=0): Initializes the LightningModule, setting up the YOLOv3 model, loss function, and learning rate based on configuration. If lr_value is provided, it sets the learning rate to that value, otherwise, uses the default learning rate from the configuration. 2. forward(self, x): Passes the input tensor x through the YOLOv3 model to obtain predictions.\ 3. configure_optimizers(self): Sets up the optimizer and a learning rate scheduler (OneCycleLR) with specified parameters for training. 4. get_all_loaders(self): Returns the training, testing, and validation data loaders obtained from provided CSV paths. 5. train_dataloader(self): Returns the training data loader for training phase. 6. training_step(self, batch, batch_idx): Performs a single training step on the given batch of data, calculating losses and logging them. 7. validation_dataloader(self): Returns the validation data loader for validation phase. 8. validation_step(self, batch, batch_idx): Executes a validation step on the given batch, computes losses, and logs them. 9. test_dataloader(self): Provides the test data loader for the testing phase. 10.test_step(self, batch, batch_idx): Performs a test step on the batch of data, computes losses, and logs them. 11. on_train_start(self): If configured, loads a checkpointed model and optimizer state. Computes scaled anchor boxes using configuration parameters and sends them to the device. ~~~ The utilities module: ~~~ 1. cells_to_bboxes(predictions, anchors, S, is_preds=True): This function converts YOLO predictions into bounding boxes that are scaled relative to the entire image. It takes predictions, anchors, and the number of cells used for image division, and returns converted bounding boxes containing class index, object score, and coordinates. 2. intersection_over_union(boxes_preds, boxes_labels, box_format="midpoint"): This function calculates the Intersection over Union (IoU) between predicted and target bounding boxes. It accepts bounding box predictions and labels along with a format indicator for the box representation and returns the IoU values for each pair of boxes. 3. non_max_suppression(bboxes, iou_threshold, threshold, box_format="corners"): Implements Non-Maximum Suppression (NMS) to filter out overlapping bounding boxes. It takes a list of boxes with class predictions, scores, and coordinates, along with IoU and score thresholds, and returns selected non-overlapping boxes. 4. plot_image(image, boxes): This function visualizes predicted bounding boxes on an image. It takes the image and a list of boxes, each containing class prediction, confidence, and coordinates, and displays the image with bounding boxes and labels. 5. YoloCAM class: A custom class extending BaseCAM to compute Class Activation Maps (CAMs) for YOLO-like models. It utilizes forward passes, activations, gradients, and CAM aggregation methods to visualize which regions contribute to classification decisions. ~~~ The App: 1. Model Loading and Setup 2. The transforms pipeline is defined using the albumentations library to preprocess images for the model. It includes resizing, padding, normalization, and conversion to tensors. 3. Anchors and Scales: The ANCHORS list contains anchor box dimensions scaled to [0, 1]. The S list holds the grid sizes corresponding to different scales. Scaled Anchors Calculation 4. The process_image_and_plot function takes an image, the model, and scaled anchors as inputs. 5. A list of example image paths is provided in the examples list. 6. Processed Image Function: The processed_image function takes an image as input and returns the processed image with bounding boxes and the corresponding CAM. 7. Interface Setup: ~~~ A web interface is set up using the gr.Interface class from the "gradio" library. The processed_image function is used as the processing function. The interface takes an input image and displays the processed image with bounding boxes and the CAM visualization. The title, description, examples, and other settings are configured for the interface. The interface is launched using demo.launch(). ~~~ Gradio is then used for visulaization of the app. The App related information can be found [Custom YoloV3 App](https://huggingface.co/spaces/PrarthanaTS/YoloV3_PASCAL) ## License This project is licensed under the MIT License - see the LICENSE file for details. ## Acknowledgments https://github.com/aladdinpersson/Machine-Learning-Collection/tree/master/ML/Pytorch/object_detection/YOLOv3 The GradCAM implementation is based on the pytorch_grad_cam library (https://github.com/jacobgil/pytorch-grad-cam). Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference