July 2018
Beginner to intermediate
406 pages
9h 55m
English
We will have to extend dist_raw to calculate the vector distance not on the raw vectors but on the normalized ones instead:
def dist_norm(v1, v2):
v1_normalized = v1 / scipy.linalg.norm(v1.toarray())
v2_normalized = v2 / scipy.linalg.norm(v2.toarray())
delta = v1_normalized - v2_normalized
return scipy.linalg.norm(delta.toarray())
This leads to the following similarity measurement, when being executed with best_post(X_train, new_post_vec, dist_norm):
=== Post 0 with dist=1.41:
'This is a toy post about machine learning. Actually, it contains not much interesting stuff.'
=== Post 1 with dist=0.86:
'Imaging databases provide storage capabilities.'
=== Post 2 with dist=0.92:
'Most imaging databases save ...Read now
Unlock full access