Update README.md
Browse files
README.md
CHANGED
@@ -24,7 +24,7 @@ One notable feature is that the architecture (trained or not) admits a *continuo
|
|
24 |
FAQ (as the author imagines):
|
25 |
|
26 |
- Q: Who needs another ConvNet, when the SOTA for ImageNet-1k is now in the low 80s with models of comparable size?
|
27 |
-
- A: Aside from shortage of resources to perform extensive experiments, the real answer is that the new symmetry has the potential to be exploited
|
28 |
- Q: Multiplication is too simple, someone must have tried it?
|
29 |
- A: Perhaps. My bet is whoever tried it soon found the model fail to train with standard ReLU. Without the belief in the underlying PDE perspective, maybe it wasn't pushed to its limit.
|
30 |
- Q: Is it not similar to attention in Transformer?
|
|
|
24 |
FAQ (as the author imagines):
|
25 |
|
26 |
- Q: Who needs another ConvNet, when the SOTA for ImageNet-1k is now in the low 80s with models of comparable size?
|
27 |
+
- A: Aside from shortage of resources to perform extensive experiments, the real answer is that the new symmetry has the potential to be exploited (e.g., symmetry-aware optimization). The non-activation nonlinearity does have more "naturalness" (coordinate independence) that is innate in many equations in mathematics and physics. Activation is but a legacy from the early days of models inspired by *biological* neural networks.
|
28 |
- Q: Multiplication is too simple, someone must have tried it?
|
29 |
- A: Perhaps. My bet is whoever tried it soon found the model fail to train with standard ReLU. Without the belief in the underlying PDE perspective, maybe it wasn't pushed to its limit.
|
30 |
- Q: Is it not similar to attention in Transformer?
|