liuyao
/

QLNet

Image Classification

timm

PDE

ConvNet

Model card Files Files and versions Community

liuyao commited on Nov 23, 2023

Commit

79fd544

•

1 Parent(s): 444c338

Update README.md

Browse files

Files changed (1) hide show

README.md +10 -0

README.md CHANGED Viewed

@@ -21,6 +21,16 @@ Based on **quasi-linear hyperbolic systems of PDEs** [[Liu et al, 2023](https://
 One notable feature is that the architecture (trained or not) admits a *continuous* symmetry in its parameters. Check out the [notebook](https://colab.research.google.com/#fileId=https://huggingface.co/liuyao/QLNet/blob/main/QLNet_symmetry.ipynb) for a demo that makes a particular transformation on the weights while leaving the output *unchanged*.
 *This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).*

 One notable feature is that the architecture (trained or not) admits a *continuous* symmetry in its parameters. Check out the [notebook](https://colab.research.google.com/#fileId=https://huggingface.co/liuyao/QLNet/blob/main/QLNet_symmetry.ipynb) for a demo that makes a particular transformation on the weights while leaving the output *unchanged*.
+FAQ (as the author imagines):
+- Q: Who needs another ConvNet, when the SOTA for ImageNet-1k is now in the low 80s with models of comparable size?
+- A: Aside from shortage of resources to perform extensive experiments, the real answer is that the new symmetry has the potential to be exploited, in different ways. The non-elementwise nonlinearity does have more "natural"-ness (coordinate independence) that is inherent in equations in mathematics and physics.
+- Q: Multiplication is too simple, someone must have tried it?
+- A: Perhaps. My bet is whoever tried it soon found the model fail to train with standard ReLU. Without the belief in the underlying PDE perspective, maybe it wasn't pushed to its limit.
+- Q: Is it not similar to attention in Transformer?
+- A: It is, indeed. It's natural to wonder if the activation functions in Transformer could be removed (or reduced) while still achieve comparable performance.
 *This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).*