Initialization

Welcome to the first assignment of "Improving Deep Neural Networks".

Training your neural network requires specifying an initial value of the weights. A well chosen initialization method will help learning.

If you completed the previous course of this specialization, you probably followed our instructions for weight initialization, and it has worked out so far. But how do you choose the initialization for a new neural network? In this notebook, you will see how different initializations lead to different results.

A well chosen initialization can:

To get started, run the following cell to load the packages and the planar dataset you will try to classify.

You would like a classifier to separate the blue dots from the red dots.

1 - Neural Network model

You will use a 3-layer neural network (already implemented for you). Here are the initialization methods you will experiment with:

Instructions: Please quickly read over the code below, and run it. In the next part you will implement the three initialization methods that this model() calls.

2 - Zero initialization

There are two types of parameters to initialize in a neural network:

Exercise: Implement the following function to initialize all parameters to zeros. You'll see later that this does not work well since it fails to "break symmetry", but lets try it anyway and see what happens. Use np.zeros((..,..)) with the correct shapes.

Expected Output:

**W1** [[ 0. 0. 0.] [ 0. 0. 0.]]
**b1** [[ 0.] [ 0.]]
**W2** [[ 0. 0.]]
**b2** [[ 0.]]

Run the following code to train your model on 15,000 iterations using zeros initialization.

The performance is really bad, and the cost does not really decrease, and the algorithm performs no better than random guessing. Why? Lets look at the details of the predictions and the decision boundary:

The model is predicting 0 for every example.

In general, initializing all the weights to zero results in the network failing to break symmetry. This means that every neuron in each layer will learn the same thing, and you might as well be training a neural network with $n^{[l]}=1$ for every layer, and the network is no more powerful than a linear classifier such as logistic regression.

What you should remember:

  • The weights $W^{[l]}$ should be initialized randomly to break symmetry.
  • It is however okay to initialize the biases $b^{[l]}$ to zeros. Symmetry is still broken so long as $W^{[l]}$ is initialized randomly.