Provided that we have an idea of what normal login attempt activity (minus the hackers) looks like on the site, we can flag values that deviate from this by a certain percentage. In order to calculate this baseline, we could take a few IP addresses at random with replacement for each hour, and average the number of login attempts they made; we are bootstrapping since we don't have much data (about 40 unique IP addresses to pick from for each of the 24 hours).
To do this, we could write a function that takes in the aggregated dataframe we just made, along with the name of a statistic to calculate per column of the data to use as the starting point for the threshold:
def get_baselines(hourly_ip_logs, func, *args, **kwargs): ...