GRU is simpler than LSTM and has only two internal gates, namely, the update gate (zt) and the reset gate (rt). The computations of the update and reset gates are as follows:
The state st of the timestep t is computed using the input xt , state st-1 from the previous timestep, the update, and the reset gates:
The update being computed by a sigmoid function determines how much of the previous step's memory is to be retained in the current timestep. The reset gate controls how to combine the previous memory with the current step's input.
Compared to LSTM, which has three gates, ...