Spaces:
Running
on
A10G
Running
on
A10G
File size: 670 Bytes
251e479 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
# FAQs
**Q:** If the weight of a conv layer is zero, the gradient will also be zero, and the network will not learn anything. Why "zero convolution" works?
**A:** This is wrong. Let us consider a very simple
$$y=wx+b$$
and we have
$$\partial y/\partial w=x, \partial y/\partial x=w, \partial y/\partial b=1$$
and if $w=0$ and $x \neq 0$, then
$$\partial y/\partial w \neq 0, \partial y/\partial x=0, \partial y/\partial b\neq 0$$
which means as long as $x \neq 0$, one gradient descent iteration will make $w$ non-zero. Then
$$\partial y/\partial x\neq 0$$
so that the zero convolutions will progressively become a common conv layer with non-zero weights.
|