Model Card for Model ID

This finetuned YOLOv5 model is developed to aid businesses in automating the inspection of returned goods. It utilizes advanced computer vision techniques to detect, classify, and assess the condition of items from images, determining whether returns are genuine or potentially fraudulent. The model is tailored to recognize various product conditions and features that align with common return reasons, enabling quick and efficient processing within return workflows.

Model Details

Model Description

The finetuned YOLOv5 model is designed specifically for use in retail and ecommerce environments to assist with the assessment of returned merchandise. It uses deep learning algorithms to analyze images of returned items, identifying specific product features, damages, or discrepancies that may indicate misuse or fraud. This model has been trained on a diverse dataset of product images, capturing a wide range of conditions, from new to heavily used items.

The model's capabilities include detecting subtle signs of wear and tear, modifications, or missing components that are often overlooked in manual inspections. By automating the inspection process, the model helps streamline return operations, reduce human error, and prevent fraudulent returns, thereby protecting revenue and improving customer service efficiency.

This YOLOv5 model variant has been optimized to perform well under various lighting conditions and camera angles, making it robust and reliable for deployment in varied operational settings where returns are processed. It integrates seamlessly with existing computer vision pipelines and can be further connected to APIs like OpenAI's GPT for enhanced decision-making about the item's return eligibility based on visual assessment.

Developed by: Cody Liu, Arjun Dabir
Model type: YOLOv5 (You Only Look Once version 5), Fine-tuned Object Detection Model
Language(s) (NLP): Python
License: Apache License 2.0
Finetuned from model: YOLOv5

Uses

Direct Use

This finetuned YOLOv5 model is designed to detect and classify objects in images for return verification processes. It's intended for businesses to automate the inspection of returned goods, determining their condition and authenticity. The primary users are retail companies and online marketplaces aiming to streamline return operations and reduce fraudulent activities.

Out-of-Scope Use

The model is not intended for applications beyond visual inspection tasks, such as medical image analysis, autonomous driving, or any environment where its object detection capabilities may not apply directly. It should not be used as a standalone decision-maker without human oversight due to the potential for misclassification. Misuse includes any application involving sensitive personal data or scenarios where a misclassification could lead to safety risks.

Bias, Risks, and Limitations

This model, a fine-tuned version of YOLOv5 for object detection, is integrated with a GPT-based API to assess the condition of returned items. While this setup aims to automate the evaluation of returned goods, several biases, risks, and limitations are inherent in the technology:

Bias in Training Data: The object detection model's performance is contingent on the diversity and representativeness of its training dataset. If the training data lacks variety in terms of item conditions, environments, or object types, the model may exhibit biased or underperformative behavior against unrepresented categories. Risk of Hallucination in LLM: The use of a language model (GPT) for interpreting object detection results introduces a risk of "hallucinations" or generating incorrect or misleading information based on the detected items. These inaccuracies can lead to incorrect assessments of item conditions, potentially categorizing non-fraudulent returns as fraudulent. Limitations in Detection Capabilities: While YOLOv5 is robust in detecting objects within diverse and complex scenes, its accuracy can be compromised under conditions of poor lighting, occlusion, or unusual item orientations. These factors can lead to false negatives or false positives in identifying items and their conditions. Sociotechnical Implications: Relying on automated systems for assessing returns could have implications for consumer trust and satisfaction. Incorrect assessments due to model limitations or errors can lead to customer dissatisfaction and potential loss of business, particularly if customers feel their returns are unjustly categorized. Out-of-Scope Use: The model is not designed for and should not be used in scenarios involving sensitive or regulated items, such as pharmaceuticals, where specialized detection and assessment systems are required. Misuse in such contexts could lead to serious safety and compliance issues.

Acknowledging these limitations is crucial for deploying the model in a manner that minimizes risks and ensures fairness and accuracy in its applications. Further, continuous monitoring and updating of both the object detection and language processing components are recommended to address emergent biases or inaccuracies.

Recommendations

Given the identified biases, risks, and limitations associated with the combined use of the YOLOv5 object detection model and the GPT language model in the returns assessment pipeline, the following recommendations are proposed to mitigate potential issues and enhance overall system effectiveness:

Enhance Dataset Diversity: Regularly update and expand the training datasets for the YOLOv5 model to include a wider range of items, conditions, and environmental factors. This will help reduce bias and improve the model's accuracy across diverse real-world scenarios. Improve Error Handling: Develop robust error-handling and verification protocols to address and mitigate the risks of hallucinations from the GPT model. This could include cross-verifications with additional data sources or manual reviews in cases of uncertainty or high-risk assessments. Conduct Regular Model Audits: Perform periodic audits of both the YOLOv5 and GPT models to assess and improve their performance and fairness. This includes testing the models against new and varied datasets to identify any potential drifts or biases in model behavior. Increase Transparency: Provide clear documentation and transparency regarding the model's capabilities, limitations, and the basis of its decisions. This could involve detailed logs of decision pathways and the factors influencing model assessments, accessible to both customers and regulatory bodies. User Education: Educate users and stakeholders about the capabilities, general workings, and limitations of the AI system. This helps set realistic expectations and promotes more informed and cautious use of the technology. Develop Contingency Plans: Establish contingency plans including manual oversight and customer service interventions to handle disputes or failures in the automated system effectively. This will help maintain customer trust and mitigate negative impacts from potential model failures. Ethical and Compliance Checks: Ensure that the deployment and ongoing use of the model comply with relevant laws and ethical guidelines, particularly those concerning consumer rights and data protection. Implementing these recommendations will help in responsibly leveraging AI capabilities to enhance business processes while maintaining trust and compliance.

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Use the code below to get started with the model.

import torch
import intel_extension_for_pytorch as ipex
from models.common import DetectMultiBackend
from utils.general import non_max_suppression, scale_boxes
from utils.torch_utils import select_device
from utils.dataloaders import LoadImages
from pathlib import Path

def run_inference(weights, source, imgsz=(640, 640), conf_thres=0.25, iou_thres=0.45):
    # Initialize device and model
    device = select_device('')
    model = DetectMultiBackend(weights, device=device, dnn=False)
    model = ipex.optimize(model, dtype=torch.float32)  # Optimize model

    # Load image
    dataset = LoadImages(source, img_size=imgsz, stride=model.stride, auto=model.pt)
    path, img, im0s, _ = next(iter(dataset))

    # Inference
    img = torch.from_numpy(img).to(device)
    img = img.float()  # uint8 to fp32
    img /= 255  # 0 - 255 to 0.0 - 1.0
    if len(img.shape) == 3:
        img = img[None]  # expand for batch dim

    with torch.cpu.amp.autocast():  # Enable mixed precision
        pred = model(img, augment=False, visualize=False)

    # Apply non-max suppression
    pred = non_max_suppression(pred, conf_thres, iou_thres)

    # Scale boxes to original image size and display or save
    for i, det in enumerate(pred):  # detections per image
        if len(det):
            det[:, :4] = scale_boxes(img.shape[2:], det[:, :4], im0s.shape).round()

    return det  # Return detections

if __name__ == '__main__':
    weights_path = 'path/to/yolov5s.pt'
    image_path = 'path/to/image.jpg'
    detections = run_inference(weights_path, image_path)
    print(f'Detections: {detections}')

Key Modifications:

Intel IPEX Optimization: The model is wrapped with ipex.optimize() right after its instantiation to apply Intel-specific optimizations. You can specify the data type (torch.float32 or torch.bfloat16) based on your preference for precision and performance.
Mixed Precision: Utilizes torch.cpu.amp.autocast() for mixed precision during inference, which can provide a boost in performance with minimal impact on accuracy when running on CPUs that support vector neural network instructions (VNNI).

Training Details

Training Data

https://huggingface.co/datasets/imagenet-1k

The ImageNet-1K dataset, available on Hugging Face, provides access to a subset of the larger ImageNet database, specifically the ILSVRC 2012 configuration. It includes 1,281,167 training images, 50,000 validation images, and 100,000 test images across 1,000 different object classes. This dataset is a fundamental resource for training deep learning models in various computer vision tasks due to its extensive range of high-quality, human-annotated images.

Training Procedure

The YOLOv5 model was fine-tuned using the Intel® Extension for PyTorch*, which significantly optimized its performance on Intel architectures. This extension allows for more efficient computation and resource utilization, especially by enhancing the utilization of CPU capabilities, which are often less emphasized in typical GPU-centric training processes.

Technical Integration: Intel® Extension for PyTorch: This extension optimizes PyTorch operations on Intel CPUs, leveraging Intel's oneDNN primitives to improve both training and inference speeds. Intel® Deep Learning Boost (VNNI): This was employed to accelerate integer operations, common in convolutional networks like YOLOv5, enhancing model throughput during training. BFloat16 Training: The use of BFloat16 data types supported by Intel CPUs allowed the model to train with larger batch sizes and faster epoch times with minimal impact on precision. Parallel Training: The model used Intel's oneAPI Collective Communications Library (oneCCL) for efficient distributed training across Intel CPUs, enhancing scalability and reducing training times.

Performance Improvements: The optimizations led to a noticeable increase in training speed and efficiency compared to traditional training setups on similar hardware. Energy efficiency was also prioritized, with adjustments during training phases resulting in reduced power consumption.

Tools and Libraries: Intel VTune™ Profiler: This tool was utilized to analyze the model's performance during training, helping to identify computational bottlenecks and optimize processing. Intel® Advisor: This tool provided recommendations for vectorization and threading improvements, crucial for maximizing the multi-core capabilities of Intel CPUs.

These enhancements facilitated by Intel’s tools not only shortened the training cycle but also improved the overall efficiency of the YOLOv5 model, making it highly suitable for integration into computer vision pipelines that assess product returns.

Training Hyperparameters

Training regime: bf16 mixed precision

Results

[More Information Needed]

Summary

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: Small VM - Intel® Xeon 4th Gen ® Scalable processor
Cloud Provider: Intel® Developer Cloud
Compute Region: us-region-1

Citation

https://zenodo.org/records/7347926

CodyLiu
/

checkThat_YOLOv5