metadata

datasets:
  - imagenet-1k
library_name: timm
license: apache-2.0
pipeline_tag: image-classification
metrics:
  - accuracy
tags:
  - ResNet
  - CNN
  - PDE

Model Card for Model ID

Based on quasi-linear hyperbolic systems of PDEs [Liu et al, 2023], the QLNet makes an entry into uncharted waters of ConvNet model space marked by the use of (element-wise) multiplication in lieu of ReLU as the primary nonlinearity. It achieves comparable performance as ResNet50 on ImageNet-1k (acc=78.4), demonstrating that it has the same level of capacity/expressivity, and deserves more analysis and study (hyper-paremeter tuning, optimizer, etc.) by the academic community.

One notable feature is that the architecture (trained or not) admits a continuous symmetry in its parameters. Check out the notebook for a demo that makes a particular transformation on the weights while leaving the output unchanged.

FAQ (as the author imagines):

Q: Who needs another ConvNet, when the SOTA for ImageNet-1k is now in the low 80s with models of comparable size?
A: Aside from shortage of resources to perform extensive experiments, the real answer is that the new symmetry has the potential to be exploited, in different ways. The non-elementwise nonlinearity does have more "natural"-ness (coordinate independence) that is inherent in equations in mathematics and physics.
Q: Multiplication is too simple, someone must have tried it?
A: Perhaps. My bet is whoever tried it soon found the model fail to train with standard ReLU. Without the belief in the underlying PDE perspective, maybe it wasn't pushed to its limit.
Q: Is it not similar to attention in Transformer?
A: It is, indeed. It's natural to wonder if the activation functions in Transformer could be removed (or reduced) while still achieve comparable performance.

This modelcard aims to be a base template for new models. It has been generated using this raw template.

Model Details

Model Description

Instead of the bottleneck block of ResNet50 which consists of 1x1, 3x3, 1x1 in succession, this simplest version of QLNet does a 1x1, splits into two equal halves and multiplies them, then applies a 3x3 (depthwise), and a 1x1, all without activation functions except at the end of the block, where a "radial" activation function that we call hardball is applied.

Developed by: Yao Liu 刘杳
Model type: Convolutional Neural Network (ConvNet)
License: [More Information Needed]
Finetuned from model: N/A (trained from scratch)

Model Sources [optional]

Repository: ConvNet from the PDE perspective
Paper: A Novel ConvNet Architecture with a Continuous Symmetry
Demo: [More Information Needed]

How to Get Started with the Model

Use the code below to get started with the model.

import torch, timm
from qlnet import QLNet

model = QLNet()
model.load_state_dict(torch.load('qlnet-50-v0.pth.tar')['state_dict'])
model.eval()

Training Details

Training and Testing Data

ImageNet-1k

[More Information Needed]

Training Procedure

We use the training script in timm

python3 train.py ../datasets/imagenet/ --model resnet50 --num-classes 1000 --lr 0.1 --warmup-epochs 5 --epochs 240 --weight-decay 1e-4 --sched cosine --reprob 0.4 --recount 3 --remode pixel --aa rand-m7-mstd0.5-inc1 -b 192 -j 6 --amp --dist-bn reduce

Results

qlnet-50-v0: acc=78.40