We herein use sigmoid as the activation function as an example. We first need to define the `sigmoid` function and its derivative function:

>>> def sigmoid(z):... return 1.0 / (1 + np.exp(-z))>>> def sigmoid_derivative(z):... return sigmoid(z) * (1.0 - sigmoid(z))

You can derive the derivative yourselves if you want to verify it.

We then define the training function, which takes in the training dataset, the number of units in the hidden layer (we only use one hidden layer as an example), and the number of iterations:

>>> def train(X, y, n_hidden, learning_rate, n_iter):... m, n_input = X.shape... W1 = np.random.randn(n_input, n_hidden)... b1 = np.zeros((1, n_hidden))... W2 = np.random.randn(n_hidden, 1)... b2 = ...