Update README.md
Browse files
README.md
CHANGED
@@ -29,8 +29,8 @@ FAQ (as the author imagines):
|
|
29 |
- A: Perhaps. My bet is whoever tried it soon found the model fail to train with standard ReLU. Without the belief in the underlying PDE perspective, maybe it wasn't pushed to its limit.
|
30 |
- Q: Is it not similar to attention in Transformer?
|
31 |
- A: It is, indeed. It's natural to wonder if the activation functions in Transformer could be removed (or reduced) while still achieve comparable performance.
|
32 |
-
- Q: If the
|
33 |
-
- A: The transformation in the demo indeed can be used to reduce the weights from the get-go. However, there are variants that admit an even large symmetry that would be hard to remove.
|
34 |
|
35 |
*This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).*
|
36 |
|
@@ -45,7 +45,7 @@ Instead of the `bottleneck` block of ResNet50 which consists of 1x1, 3x3, 1x1 in
|
|
45 |
|
46 |
- **Developed by:** Yao Liu 刘杳
|
47 |
- **Model type:** Convolutional Neural Network (ConvNet)
|
48 |
-
- **License:**
|
49 |
- **Finetuned from model:** N/A (*trained from scratch*)
|
50 |
|
51 |
### Model Sources [optional]
|
|
|
29 |
- A: Perhaps. My bet is whoever tried it soon found the model fail to train with standard ReLU. Without the belief in the underlying PDE perspective, maybe it wasn't pushed to its limit.
|
30 |
- Q: Is it not similar to attention in Transformer?
|
31 |
- A: It is, indeed. It's natural to wonder if the activation functions in Transformer could be removed (or reduced) while still achieve comparable performance.
|
32 |
+
- Q: If the weight/parameter space has a symmetry (other than permutations), perhaps there's redundancy in the weights.
|
33 |
+
- A: The transformation in the demo indeed can be used to reduce the weights from the get-go. However, there are variants of the model that admit an even large symmetry that would be hard to remove. It is also related to the phenomenon of "flat minima" found empirically in conventional deep neural networks.
|
34 |
|
35 |
*This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).*
|
36 |
|
|
|
45 |
|
46 |
- **Developed by:** Yao Liu 刘杳
|
47 |
- **Model type:** Convolutional Neural Network (ConvNet)
|
48 |
+
- **License:** As academic work, it is free for all to use. It is a natural progression from the origianl ConvNet (of LeCun) and ResNet, with "depthwise" from MobileNet.
|
49 |
- **Finetuned from model:** N/A (*trained from scratch*)
|
50 |
|
51 |
### Model Sources [optional]
|