AI & ML interests

TinyML, Model compression, ultra-low bit quantization, hardware-aware NAS

Organization Card
About org cards

About Deeplite

Deeplite provides an AI inference optimization platform for the millions of developers struggling to deploy edge-AI to products in smart city, 5G IoT, autonomous vehicles, and the things we use every day by making smaller, faster and more energy efficient AI.  We call it AI for everyday life.

PRODUCT

Deeplite’s Neutrino platform builds on the PyTorch ecosystem by natively supporting pre-trained multi-layer perceptron (MLP) and Convolutional neural network (CNN) models with our optimization and quantization methods.  Models are available in the Neutrino Zoo or can be a user’s custom model.

Our model optimization receives a model, dataset and constraint from the user. The user can specify a target accuracy and the optimization will try to shrink and/or speed up the model inference as much as possible, within that target accuracy. This is unique in that it keeps the model in full precision (no quantization), so the optimized model can be deployed to any target hardware, including AI accelerators/NPUs.  While hardware agnostic at full precision, users can also leverage the Hardware-Aware optimization within Neutrino to optimize a model for a specific target hardware, tuning it for performance on that specific HW as well as replacing any unsupported operations with supported ops.  

Model quantization is a training-aware approach (QAT) that can convert a full precision PyTorch model into a mixed precision model targeting 1, 2, 3, 4, 8 and 32bit layers (weights and activations).  The quantization method maximizes accuracy retention while achieving 2-15x memory optimization and 2-5x inference speedup when combined with Deeplite compiler and runtime for on-device inference.

Neutrino is deployed on premise, so no sensitive data or models need to be shared outside their own development environment.    We have also created a run time engine called DeepliteRT which can run sub 8-bit quantized models natively on Arm CPUs.

SOLUTIONS

We are excited to release YOLOBench, a latency-accuracy benchmark of over 550 YOLO-based object detectors on four different datasets (COCO, PASCAL VOC, WIDERFACE, and SKU-110K) as wll as five initial embedded hardware platforms (x86 CPU, ARM CPU, Nvidia GPU, Khadas VIM3 NPU and Orange Pi NPU). Not all YOLO models are created equal. Depending on the model architecture, the dataset, and the hardware platform, some YOLO models may perform better than others in terms of accuracy and speed.

That's where YOLOBench by Deeplite comes in. YOLOBench provides a fair and controlled comparison of these models by using a fixed training environment (code and training hyperparameters). It also collects accuracy and latency numbers for each model and dataset combination. Find and download the best model for your hardware and use case using YOLOBench by Deeplite on our HuggingFace Space.

Interested in adding benchmark data for your own hardware? Email us at yolobench@deeplite.ai for more information!

models

None public yet

datasets

None public yet