In our implementation of a sparse encoder, we defined KL divergence in a kl_divergence function in the SparseEncoder class, which is nothing but an implementation of the preceding formula:
def kl_divergence(self, p, p_hat): return tf.reduce_mean( p*(tf.log(p)/tf.log(p_hat)) + (1-p)*(tf.log(1-p)/tf.log(1-p_hat)))