January 2018
Beginner to intermediate
284 pages
8h 35m
English
For ReLU specifically, Kaiming He and others, in their paper, Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification (https://arxiv.org/abs/1502.01852), designed a specific initialization for ReLUs as
.
For example, in one-line Python code, this can be as follows:
>>> w = np.random.randn(n_in) * sqrt(2.0/n_in)
How do we understand this? Intuitively, a rectifying linear unit is zero for half of its input, so we would need to double the size of weight variance to keep the signal’s variance constant.
Read now
Unlock full access