Using the cosine similarity to quantify bad passwords

In this section, we will turn to some purely mathematical reasoning to judge password strength. We will use tools from scikit-learn to learn and understand password strength by comparing them to past passwords using vector similarities.

Cosine similarity is a quantitative measure [-1,1] of how similar two vectors are in a Vector Space. The closer they are to each other, the smaller the angle between them. The smaller the angle between them, the larger the cosine of that angle is; for example:

  • If two vectors are opposites of each other, their angle is 180, and cos(0) = -1.
  • If two vectors are the same, their angle is 0, and cos(0) = 1.
  • If two vectors are perpendicular, their angle is 90, ...

Get Hands-On Machine Learning for Cybersecurity now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.