liuyao
/

QLNet

Image Classification

timm

PDE

ConvNet

Model card Files Files and versions Community

liuyao commited on Dec 6, 2023

Commit

a56e05f

•

1 Parent(s): 809645d

Update README.md

Browse files

Files changed (1) hide show

README.md +8 -8

README.md CHANGED Viewed

@@ -23,14 +23,14 @@ One notable feature is that the architecture (trained or not) admits a *continuo
 FAQ (as the author imagines):
-- Q: Who needs another ConvNet, when the SOTA for ImageNet-1k is now in the low 80s with models of comparable size?
-- A: Aside from lack of resources to perform extensive experiments, the real answer is that the new symmetry has the potential to be exploited (e.g., symmetry-aware optimization). The non-activation nonlinearity does have more "naturalness" (coordinate independence) that is innate in many equations in mathematics and physics. Activation is but a legacy from the early days of models inspired by *biological* neural networks.
-- Q: Multiplication is too simple, someone must have tried it?
-- A: Perhaps. My bet is whoever tried it soon found the model fail to train with standard ReLU. Without the belief in the underlying PDE perspective, maybe it wasn't pushed to its limit.
-- Q: Is it not similar to attention in Transformer?
-- A: It is, indeed. It's natural to wonder if the activation functions in Transformer could be removed (or reduced) while still achieve comparable performance.
-- Q: If the weight/parameter space has a symmetry (other than permutations), perhaps there's redundancy in the weights.
-- A: The transformation in our demo indeed can be used to reduce the weights from the get-go. However, there are variants of the model that admit a much larger symmetry. It is also related to the phenomenon of "flat minima" found empirically in some conventional neural networks.
 *This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).*

 FAQ (as the author imagines):
+- **Q**: *Who needs another ConvNet, when the SOTA for ImageNet-1k is now in the low 80s with models of comparable size?*
+- **A**: Aside from lack of resources to perform extensive experiments, the real answer is that the new symmetry has the potential to be exploited (e.g., symmetry-aware optimization). The non-activation nonlinearity does have more "naturalness" (coordinate independence) that is innate in many equations in mathematics and physics. Activation is but a legacy from the early days of models inspired by *biological* neural networks.
+- **Q**: *Multiplication is too simple, someone must have tried it?*
+- **A**: Perhaps. My guess is whoever tried it soon found the model fail to train with standard ReLU. Without the conviction in the underlying PDE perspective, maybe it wasn't pushed to its limit.
+- **Q**: *Is it not similar to attention in Transformer?*
+- **A**: It is, indeed. It's natural to wonder if the activation functions in Transformer could be removed (or reduced) while still achieve comparable performance.
+- **Q**: *If the parameter space has a symmetry (beyond permutations), perhaps there's redundancy in the weights.*
+- **A**: The transformation in our demo indeed can be used to reduce the weights from the get-go. However, there are variants of the model that admit a much larger symmetry. It is also related to the phenomenon of "flat minima" found empirically in some conventional neural networks.
 *This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).*