Update README.md
Browse files
README.md
CHANGED
@@ -24,7 +24,7 @@ One notable feature is that the architecture (trained or not) admits a *continuo
|
|
24 |
FAQ (as the author imagines):
|
25 |
|
26 |
- Q: Who needs another ConvNet, when the SOTA for ImageNet-1k is now in the low 80s with models of comparable size?
|
27 |
-
- A: Aside from shortage of resources to perform extensive experiments, the real answer is that the new symmetry has the potential to be exploited, in different ways (e.g., in optimization). The non-activation nonlinearity does have more "natural"-ness (coordinate independence) that is inherent in many equations in mathematics and physics. Activation is but a legacy from
|
28 |
- Q: Multiplication is too simple, someone must have tried it?
|
29 |
- A: Perhaps. My bet is whoever tried it soon found the model fail to train with standard ReLU. Without the belief in the underlying PDE perspective, maybe it wasn't pushed to its limit.
|
30 |
- Q: Is it not similar to attention in Transformer?
|
|
|
24 |
FAQ (as the author imagines):
|
25 |
|
26 |
- Q: Who needs another ConvNet, when the SOTA for ImageNet-1k is now in the low 80s with models of comparable size?
|
27 |
+
- A: Aside from shortage of resources to perform extensive experiments, the real answer is that the new symmetry has the potential to be exploited, in different ways (e.g., in optimization). The non-activation nonlinearity does have more "natural"-ness (coordinate independence) that is inherent in many equations in mathematics and physics. Activation is but a legacy from the early days of models inspired by biological *neural* networks.
|
28 |
- Q: Multiplication is too simple, someone must have tried it?
|
29 |
- A: Perhaps. My bet is whoever tried it soon found the model fail to train with standard ReLU. Without the belief in the underlying PDE perspective, maybe it wasn't pushed to its limit.
|
30 |
- Q: Is it not similar to attention in Transformer?
|